[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk regex stuff you may want
From: |
Aharon Robbins |
Subject: |
Re: gawk regex stuff you may want |
Date: |
Wed, 20 Jan 2016 21:59:35 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi Paul.
> Here are comments on the parts of the diff not incorporated in this round:
>
> > -static const char __re_error_msgid[] =
> > +static const char __re_error_msgid[] attribute_hidden =
>
> Since the constant is static, there should be no need for attribute_hidden,
> as
> the constant is used only in this compilation unit. Similarly for other
> introductions of attribute_hidden.
Thanks, I'll review that.
> > - re_dfa_t *dfa = bufp->buffer;
> > + re_dfa_t *dfa = (re_dfa_t *) bufp->buffer;
>
> This cast is not needed, as bufp->buffer is already of the proper type.
> Similarly elsewhere.
I don't see that. In regex.h:
| struct re_pattern_buffer
| {
| /* Space that holds the compiled pattern. It is declared as
| `unsigned char *' because its elements are sometimes used as
| array indexes. */
| unsigned char *__REPB_PREFIX(buffer);
| ...
| typedef struct re_pattern_buffer regex_t;
And this is:
| static bin_tree_t *
| parse (re_string_t *regexp, regex_t *preg, reg_syntax_t syntax,
| reg_errcode_t *err)
| {
| re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
So the cast seems necessary.
> > - preg->buffer = dfa;
> > + preg->buffer = (unsigned char *) dfa;
>
> This cast seems counterproductive, as dfa is already of the proper type
> and the unsigned char * is not the proper type.
Same argument for the cast.
> > - dfa->subexp_map = re_malloc (Idx, preg->re_nsub);
> > + /* some malloc()-checkers don't like zero allocations */
> > + if (preg->re_nsub > 0)
> > + dfa->subexp_map = re_malloc (int, preg->re_nsub);
> > + else
> > + dfa->subexp_map = NULL;
> > +
>
> Since malloc (0) is well-defined to return either NULL or a valid pointer
> to a zero-byte object that can be freed, the code is working as-is.
Not on older systems. Although yes, I'm not sure I still support such
systems. I don't know which checker complained. I think it was valgrind
at some point in the past.
> I'd rather look for a solution that involves silencing the checking
> without making the code bulkier and typically slower.
This is in compilation, not execution; not sure the difference
is really noticable.
> > + /*
> > + * Fedora Core 2, maybe others, have broken `btowc' that returns -1
> > + * for any value > 127. Sigh. Note that `start_ch' and `end_ch' are
> > + * unsigned, so we don't have sign extension problems.
> > + */
> > start_wc = ((start_elem->type == SB_CHAR || start_elem->type ==
> > COLL_SYM)
> > - ? __btowc (start_ch) : start_elem->opr.wch);
> > + ? start_ch : start_elem->opr.wch);
> > end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
> > - ? __btowc (end_ch) : end_elem->opr.wch);
> > + ? end_ch : end_elem->opr.wch);
> > if (start_wc == WEOF || end_wc == WEOF)
> > return REG_ECOLLATE;
>
> I don't see how this patch is helpful. If btowc returns -1 we are looking
> at an encoding error, and we can't treat that byte as if it were a
> character. In some single-byte locales, btowc returns 1 for values < 128,
> and they're encoding errors too. Perhaps you could mention a use case?
Gawk's profile5 test fails without this. This code way predates that
test though.
If a byte value given inside [...] is greater than 127, it should
be left alone to be matched as-is (no arbitrary limits).
> > /* Build single byte matching table for this equivalence class. */
> > + char_buf[1] = (unsigned char) '\0';
>
> This should be unnecessary, as the rest of the code shouldn't be looking
> at that byte. Is this something to pacify a lint checker?
Or valgrind at some point. I think a static checker that someone submitted
a report from.
> > - wc = wc2;
> > + wc = (wint_t) wc2;
>
> wc and wc2 are both integers so the cast is unnecessary. Similarly elsewhere.
There was a compiler (maybe VMS) for which this was necessary.
Thanks,
Arnold