Re: [gawk-devel] changing regex lib

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gawk-devel] changing regex lib

From:	Bruno Haible
Subject:	Re: [gawk-devel] changing regex lib
Date:	Sat, 11 Aug 2018 01:25:11 +0200
User-agent:	KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; )

Hi Paul,

> Thanks for checking. I installed the regcomp.c change into glibc and gnulib 
> so 
> we should now have the same source there as we have in Gawk.

The patch [1] looks correct to me, but it introduces a misleading comment that
could become the cause of future bugs.

Recall that for arguments c in the range 0x80..0xFF, btowc(c) can very well
be different from c (this is obvious for encodings != ISO-8859-1 on glibc,
and true even for ISO-8859-1 on Solaris and FreeBSD [2]). So, a unibyte
and a wide character "live" in different domains. There is risk that
a wide character function (isw*) get called on a value that is a unibyte,
and there is risk that btowc() gets called on a value that is a wide
character; both would be bugs.

Therefore I would rewrite the comment

/* Convert the byte B to the corresponding wide character.  In a
   unibyte locale, treat B as itself.  In a multibyte locale, return
   WEOF if B is an encoding error.  */

to

/* Convert the byte B to a value that bounds the iteration through a
   character range.
   In a unibyte locale, we use a bit set based on byte values, therefore
   return B itself.  Note! This may be != btowc (B).
   In a multibyte locale, we use comparison of wide characters, therefore
   return the wide character corresponding to B, or WEOF if B is invalid.  */

Bruno

[1] 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=c77bf91b4315efed2b61633567acc7ac3c46959c
[2] 
https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [gawk-devel] changing regex lib, Paul Eggert, 2018/08/10
- Re: [gawk-devel] changing regex lib, Bruno Haible <=
  - Re: [gawk-devel] changing regex lib, Paul Eggert, 2018/08/12
    - Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
    - Re: Rational Ranges [was Re: gnulib regex lib], Paul Eggert, 2018/08/12
    - Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
    - Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/12
    - Re: Rational Ranges [was Re: gnulib regex lib], Paul Eggert, 2018/08/12
    - Re: Rational Ranges [was Re: gnulib regex lib], arnold, 2018/08/13
- Re: [gawk-devel] changing regex lib, arnold, 2018/08/12

Prev by Date: Re: [gawk-devel] changing regex lib
Next by Date: Re: fnmatch-gnu fails to compile on OSX with clang
Previous by thread: Re: [gawk-devel] changing regex lib
Next by thread: Re: [gawk-devel] changing regex lib
Index(es):
- Date
- Thread