bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A little more regex.h pedantry


From: Eric Blake
Subject: Re: A little more regex.h pedantry
Date: Fri, 30 Jul 2010 16:33:47 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Mnenhy/0.8.3 Thunderbird/3.1.1

On 07/30/2010 04:09 PM, Reuben Thomas wrote:
> Sigh. I've been picking my way through this paragraph, looking at the code:
> 
> /* This data structure represents a compiled pattern.  Before calling
>    the pattern compiler, the fields `buffer', `allocated', `fastmap',
>    `translate', and `no_sub' can be set.  After the pattern has been
>    compiled, the `re_nsub' field is available.  All other fields are
>    private to the regex routines.  */
> 
> We observed earlier that there is an omission on the "After" side:
> not_bol and not_eol are respected during pattern matching (the
> equivalent of POSIX eflags).
> 
> What I have only just noticed, and confirmed from the code, is that
> the list of fields that can be set before compilation is excessive. In
> paticular, `fastmap' can't be set (you have to call
> re_compile_fastmap,

GNU m4 sets fastmap before calling re_compile_pattern, then later calls
re_compile_fastmap:

  if (!word_regexp.fastmap)
    word_regexp.fastmap = xcharalloc (UCHAR_MAX + 1);
  msg = re_compile_pattern (regexp, len, &word_regexp);
  assert (!msg);
  re_set_registers (&word_regexp, &regs, regs.num_regs, regs.start,
regs.end);
  if (re_compile_fastmap (&word_regexp))
    assert (false);

Thus, a single buffer is reused across multiple patterns, and the
fastmap is only allocated once.  Furthermore, you can't compile the
fastmap until after the initial pattern is compiled, but the m4 code is
proof that you can assign fastmap prior to compilation.

 and `no_sub' can't be set (because re_compile
> always overwrites it, as it does newline_anchor).

Did you compile a pattern with grouping ()?  I'm not sure, but the
behavior on no_sub may be conditional on whether there are any
sub-fields to return in the first place.

> 
> Does this analysis look right?

Not quite, by the way m4 uses regex.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]