bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c


From: Bruno Haible
Subject: Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c
Date: Tue, 28 Jan 2003 21:57:59 +0100 (CET)

Paul Eggert writes:

> True, but my question is what the symbol C_CTYPE_ASCII means.  That
> is, I am trying to understand the implementation, not trying to
> understand the API.

It means: The character set is ASCII or one of its variants or
extensions, not EBCDIC. I've corrected the comment now.

> From your remarks, apparently you mean for C_CTYPE_ASCII to mean "the
> character set is upward compatible with JIS X 0201:1997 left half
> (Japanese JIS Roman)".

Sorry I must have expressed myself wrong. If the character set has
'\\' but it has a different codepoint than in ASCII then the ASCII
optimizations should _not_ apply.

> Conversely, the #if doesn't test for '$' or '@', even those two
> characters are in JIS Roman and your remarks suggest that you intended
> to test for '$' and '@'.

My earlier remarks were wrong. '$' and '@' are not tested, precisely
because these characters are not part of the "basic character
set".

ISO-646-CN is probably not a problem, can be handled like ASCII.
(Whether C_CTYPE_ASCII gets set to 1 on a system with ISO-646-CN or
ISO-646-JP, will depend on the source code conversions that have been
performed on the source file before compilation, maybe converting
backslash to YEN SIGN or maybe not etc. - however it's not a problem
for the c_* functions.)

> Would you be convinced by an efficiency argument?
> On my host (GCC 2.95.3 with -O2, sparc), the unportable code:
> 
>   int f (int x) { return (x & ~0x7f) == 0; }
> 
> requires 4 instructions, but the portable code:
> 
>   int g (unsigned x) { return x <= 0x7f; }
> 
> requires only 3.

OK, why not. On x86 also, the generated code for

    int g (int x) { return x >= 0 && x <= 0x7f; }

is smaller.

> Besides, a few ones-complement hosts with C compilers are still in use
> (Unisys mainframes)

Let's hope that they get out of business soon :-) (They would already
have, if they didn't succeed in extorting money from people who
believe in patent threats.)

> Anyway, if it's easy, it's better to avoid code that assumes two's
> complement, since such code is a bit trickier to read

On the contrary, such code is good teaching material for bit
operations. Did you know that for every x

     ((x - 1) & (- x - 1)) + 1 == x & -x 

> > For debugging it is best to use -O0, and in this case "c-ctype.h"
> > will use the external functions, not the macros.
> 
> But that's two copies of the code, which have to maintained
> separately.  With inline functions you have one less copy of the code,
> so it should be less error-prone.

In general, I agree. In this case here, the functions won't change in
10 years.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]