[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] net.venge.monotone.deregexp
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] net.venge.monotone.deregexp |
Date: |
Wed, 28 Feb 2007 00:00:55 -0800 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Tue, Feb 27, 2007 at 06:43:32PM -0800, Zack Weinberg wrote:
> On 2/27/07, Nathaniel Smith <address@hidden> wrote:
> >On Sun, Feb 25, 2007 at 01:32:36AM -0800, Zack Weinberg wrote:
> >> Specifically, uri.cc no longer uses 'em, and a
> >> spurious #include of <boost/regex.hpp> has been removed from
> >> netsync.cc. This could probably be flushed to mainline all by itself,
> >> but I didn't.
> >
> >Any reason why not?
>
> I wanted another pair of eyes on it, especially now that there's a
> conversion of packet.cc too (which code I am not entirely happy with).
Ah, I didn't see the magic words "please review for mainline" or
similar.
> >> * diff_patch.cc: for the show-containing-function feature.
> >
> >...How can we possibly eliminate this? It's necessary if we want to
> >be consistent with diff(1), and I can't think of any other approach
> >that would be nearly as convenient anyway... (i.e., trying to use
> >globs here does not seem like a winning plan).
>
> We *could* make this operation a lua hook (more of one than it is now,
> anyway) and use lua's string.match thing. That implements a very
> basic regular expression, as far as I can tell. It has some odd
> syntax (% instead of \) but the big thing it seems to be missing is
> alternation.
>
> I'm not all that enthusiastic about this idea, but I think it's worth
> considering.
Okay.
> >if we are keeping some regexp engine around, that
> >might simplify some other things. (Don't have to rewrite the globish
> >matcher by hand, etc.)
>
> I was thinking about grabbing the fnmatch() implementation from gnulib for
> that.
> [ http://www.gnu.org/software/gnulib/MODULES.html#module=fnmatch ]
That might make sense for .mtn-ignore as well (or in any case),
assuming we want to use something like real globs there. globish
is... not quite like real globs. (No [] character matching, but with
{} alternations. And we can't trivially change it, because it's in
the network protocol.)
> Weren't you worried about unbounded recursion with pcre?
At least somewhat. It's less of a concern if we're not reading
regexen out of .mtn-ignore (= potentially untrusted content off the
network). If the user wants to define a get_encloser_pattern hook
that smashes their stack, then, umm... I hope they have fun with that.
There are at other workarounds to explore, too. pcrestack suggests
possibilities:
-- setting limits on the recursion depth
-- getting it to use heap instead of stack ("makes it run a lot more
slowly", but dunno if we care)
-- use pcre_dfa_exec instead of pcre_exec (complex trade-offs)
The first of these is likely sufficient :-).
Getting rid of regex matching in .mtn-ignore also removes the concern
about changing the details of our regex matching in the future.
-- Nathaniel
--
Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould