monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] net.venge.monotone.deregexp


From: Nathaniel Smith
Subject: Re: [Monotone-devel] net.venge.monotone.deregexp
Date: Wed, 28 Feb 2007 00:00:55 -0800
User-agent: Mutt/1.5.13 (2006-08-11)

On Tue, Feb 27, 2007 at 06:43:32PM -0800, Zack Weinberg wrote:
> On 2/27/07, Nathaniel Smith <address@hidden> wrote:
> >On Sun, Feb 25, 2007 at 01:32:36AM -0800, Zack Weinberg wrote:
> >> Specifically, uri.cc no longer uses 'em, and a
> >> spurious #include of <boost/regex.hpp> has been removed from
> >> netsync.cc. This could probably be flushed to mainline all by itself,
> >> but I didn't.
> >
> >Any reason why not?
> 
> I wanted another pair of eyes on it, especially now that there's a
> conversion of packet.cc too (which code I am not entirely happy with).

Ah, I didn't see the magic words "please review for mainline" or
similar.

> >> * diff_patch.cc: for the show-containing-function feature.
> >
> >...How can we possibly eliminate this?  It's necessary if we want to
> >be consistent with diff(1), and I can't think of any other approach
> >that would be nearly as convenient anyway... (i.e., trying to use
> >globs here does not seem like a winning plan).
> 
> We *could* make this operation a lua hook (more of one than it is now,
> anyway) and use lua's string.match thing.  That implements a very
> basic regular expression, as far as I can tell.  It has some odd
> syntax (% instead of \) but the big thing it seems to be missing is
> alternation.
> 
> I'm not all that enthusiastic about this idea, but I think it's worth
> considering.

Okay.

> >if we are keeping some regexp engine around, that
> >might simplify some other things.  (Don't have to rewrite the globish
> >matcher by hand, etc.)
> 
> I was thinking about grabbing the fnmatch() implementation from gnulib for 
> that.
> [ http://www.gnu.org/software/gnulib/MODULES.html#module=fnmatch ]

That might make sense for .mtn-ignore as well (or in any case),
assuming we want to use something like real globs there.  globish
is... not quite like real globs.  (No [] character matching, but with
{} alternations.  And we can't trivially change it, because it's in
the network protocol.)

> Weren't you worried about unbounded recursion with pcre?

At least somewhat.  It's less of a concern if we're not reading
regexen out of .mtn-ignore (= potentially untrusted content off the
network).  If the user wants to define a get_encloser_pattern hook
that smashes their stack, then, umm... I hope they have fun with that.

There are at other workarounds to explore, too.  pcrestack suggests
possibilities:
  -- setting limits on the recursion depth
  -- getting it to use heap instead of stack ("makes it run a lot more
     slowly", but dunno if we care)
  -- use pcre_dfa_exec instead of pcre_exec (complex trade-offs)

The first of these is likely sufficient :-).

Getting rid of regex matching in .mtn-ignore also removes the concern
about changing the details of our regex matching in the future.

-- Nathaniel

-- 
Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould




reply via email to

[Prev in Thread] Current Thread [Next in Thread]