[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gawk-devel] changing regex lib

From: Paul Eggert
Subject: Re: [gawk-devel] changing regex lib
Date: Sat, 11 Aug 2018 23:51:44 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Bruno Haible wrote:

Therefore I would rewrite the comment

/* Convert the byte B to the corresponding wide character.  In a
    unibyte locale, treat B as itself.  In a multibyte locale, return
    WEOF if B is an encoding error.  */


/* Convert the byte B to a value that bounds the iteration through a
    character range.
    In a unibyte locale, we use a bit set based on byte values, therefore
    return B itself.  Note! This may be != btowc (B).
    In a multibyte locale, we use comparison of wide characters, therefore
    return the wide character corresponding to B, or WEOF if B is invalid.  */

Hmm, well, both comments are pretty confusing to me. However, the first one is less confusing, at least to me: it says that the function does X, and then that in a special case the function does Y (which contradicts X), and then that in another special case the function does Z (which also contradicts X). Although this sort of wording is not strictly logical, it is pretty routine in comments and not that hard to follow.

Rather than spend much time worring about this little comment, it'd probably be more helpful to document the intended behavior of rational ranges. As I understand it, Arnold wants them to use byte values in unibyte locales and wide character values in multibyte locales, and this intent is worth mentioning somewhere central, particularly since there are multiple places in the code where it is not implemented properly.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]