[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gawk-devel] changing regex lib
From: |
Bruno Haible |
Subject: |
Re: [gawk-devel] changing regex lib |
Date: |
Sat, 11 Aug 2018 01:25:11 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) |
Hi Paul,
> Thanks for checking. I installed the regcomp.c change into glibc and gnulib
> so
> we should now have the same source there as we have in Gawk.
The patch [1] looks correct to me, but it introduces a misleading comment that
could become the cause of future bugs.
Recall that for arguments c in the range 0x80..0xFF, btowc(c) can very well
be different from c (this is obvious for encodings != ISO-8859-1 on glibc,
and true even for ISO-8859-1 on Solaris and FreeBSD [2]). So, a unibyte
and a wide character "live" in different domains. There is risk that
a wide character function (isw*) get called on a value that is a unibyte,
and there is risk that btowc() gets called on a value that is a wide
character; both would be bugs.
Therefore I would rewrite the comment
/* Convert the byte B to the corresponding wide character. In a
unibyte locale, treat B as itself. In a multibyte locale, return
WEOF if B is an encoding error. */
to
/* Convert the byte B to a value that bounds the iteration through a
character range.
In a unibyte locale, we use a bit set based on byte values, therefore
return B itself. Note! This may be != btowc (B).
In a multibyte locale, we use comparison of wide characters, therefore
return the wide character corresponding to B, or WEOF if B is invalid. */
Bruno
[1]
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=c77bf91b4315efed2b61633567acc7ac3c46959c
[2]
https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html
- Re: [gawk-devel] changing regex lib, Paul Eggert, 2018/08/10
- Re: [gawk-devel] changing regex lib,
Bruno Haible <=
- Re: [gawk-devel] changing regex lib, Paul Eggert, 2018/08/12
- Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
- Re: Rational Ranges [was Re: gnulib regex lib], Paul Eggert, 2018/08/12
- Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
- Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
- Re: Rational Ranges [was Re: gnulib regex lib], Paul Eggert, 2018/08/12
- Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/13
Re: [gawk-devel] changing regex lib, arnold, 2018/08/12