[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk regex stuff you may want
From: |
Paul Eggert |
Subject: |
Re: gawk regex stuff you may want |
Date: |
Mon, 18 Jan 2016 11:07:15 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
Aharon Robbins wrote:
Attached is a diff with bits and pieces (fixes / changes) in gawk's
regex routines that you may wish to apply to the GNULIB version.
Thanks for doing all that. I looked over that diff and installed the attached
patches into gnulib; they are taken from your diff.
Here are comments on the parts of the diff not incorporated in this round:
-static const char __re_error_msgid[] =
+static const char __re_error_msgid[] attribute_hidden =
Since the constant is static, there should be no need for attribute_hidden, as
the constant is used only in this compilation unit. Similarly for other
introductions of attribute_hidden.
- re_dfa_t *dfa = bufp->buffer;
+ re_dfa_t *dfa = (re_dfa_t *) bufp->buffer;
This cast is not needed, as bufp->buffer is already of the proper type.
Similarly elsewhere.
- preg->buffer = dfa;
+ preg->buffer = (unsigned char *) dfa;
This cast seems counterproductive, as dfa is already of the proper type and the
unsigned char * is not the proper type.
- dfa->subexp_map = re_malloc (Idx, preg->re_nsub);
+ /* some malloc()-checkers don't like zero allocations */
+ if (preg->re_nsub > 0)
+ dfa->subexp_map = re_malloc (int, preg->re_nsub);
+ else
+ dfa->subexp_map = NULL;
+
Since malloc (0) is well-defined to return either NULL or a valid pointer to a
zero-byte object that can be freed, the code is working as-is. I'd rather look
for a solution that involves silencing the checking without making the code
bulkier and typically slower.
+ /*
+ * Fedora Core 2, maybe others, have broken `btowc' that returns -1
+ * for any value > 127. Sigh. Note that `start_ch' and `end_ch' are
+ * unsigned, so we don't have sign extension problems.
+ */
start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
- ? __btowc (start_ch) : start_elem->opr.wch);
+ ? start_ch : start_elem->opr.wch);
end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
- ? __btowc (end_ch) : end_elem->opr.wch);
+ ? end_ch : end_elem->opr.wch);
if (start_wc == WEOF || end_wc == WEOF)
return REG_ECOLLATE;
I don't see how this patch is helpful. If btowc returns -1 we are looking at an
encoding error, and we can't treat that byte as if it were a character. In some
single-byte locales, btowc returns 1 for values < 128, and they're encoding
errors too. Perhaps you could mention a use case?
/* Build single byte matching table for this equivalence class. */
+ char_buf[1] = (unsigned char) '\0';
This should be unnecessary, as the rest of the code shouldn't be looking at that
byte. Is this something to pacify a lint checker?
- wc = wc2;
+ wc = (wint_t) wc2;
wc and wc2 are both integers so the cast is unnecessary. Similarly elsewhere.
- pstr->mbs[char_idx] = toupper (ch);
+ if (islower (ch))
+ pstr->mbs[char_idx] = toupper (ch);
+ else
+ pstr->mbs[char_idx] = ch;
toupper is a no-op if ch is not a lower-case character, so the patch should be
unnecessary.
+ /*
+ * ADR: valgrind says size can be 0, which then doesn't
+ * free the block of size 0. Harumph. This seems
+ * to work ok, though.
+ */
+ if (size == 0)
+ {
+ memset(set, 0, sizeof(*set));
+ return REG_NOERROR;
+ }
I assume this is related to the malloc (0) stuff above. Or possibly
MALLOC_0_IS_NONNULL is incorrect on your platform? If so, I'd rather fix that
bug rather than attempt to work around it.
diff --git a/lib/regex_internal.h b/lib/regex_internal.h
index 0307a34..c634a00 100644
--- a/lib/regex_internal.h
+++ b/lib/regex_internal.h
@@ -117,6 +117,10 @@
# define BE(expr, val) __builtin_expect (expr, val)
#else
# define BE(expr, val) (expr)
+# ifdef inline
+# undef inline
+# endif
+# define inline
#endif
This should not be necessary, since m4/regex.m4's gl_PREREQ_REGEX requires
AC_C_INLINE, which should define 'inline' on the ancient systems that lack it.
If that's not working, I'd like to fix the problem at the source rather than
work around it here.
0001-regex-fix-memory-leaks.patch
Description: Text Data
0002-regex-fix-diagnostic.patch
Description: Text Data
0003-regex-pacify-static-checkers.patch
Description: Text Data