[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c

From: Paul Eggert
Subject: Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c
Date: 28 Jan 2003 13:45:56 -0800
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.3

Bruno Haible <address@hidden> writes:

> If the character set has '\\' but it has a different codepoint than
> in ASCII then the ASCII optimizations should not apply.

In practice the optimizations work even for Japanese character sets
where '\' has a different code point; please see below.

> (Whether C_CTYPE_ASCII gets set to 1 on a system with ISO-646-CN or
> ISO-646-JP, will depend on the source code conversions that have been
> performed on the source file before compilation, maybe converting
> backslash to YEN SIGN or maybe not etc.

Yes.  I've seen and done a lot of these sort of conversions at Twin
Sun, as most of our customers are Japanese.  Invariably ASCII text is
not modified when it is translated to any of the practical JIS
variants.  Instead, its bytes are interpreted without change.  For
example, where a C program says "\n" one would see "<yen>n" when
viewing the text assuming any of the standard practical JIS encodings.
Japanese programmers and users are used to this phenomenon.  It's a
bit unpleasant, but the vast majority of users are far more bothered
if some process converts the text by translating the ASCII "\" to the
JIS backslash.

> however it's not a problem for the c_* functions.

I agree, but for Japanese it works only because "<yen>" happens to
fall into the same ctype category as "\", and similarly for
"<overbar>" and "~".  Luckily for us, the other ISO 646 national
variants are no longer used, so we don't have to worry about the fact
that, for instance, ISO 646-DE substituted an alphabetic character for
the non-alphabetic ASCII character "~".  If c_ctype had been written
20 years ago, this would have been a real problem, but nowadays I
think it's OK to ignore this problem (though it still deserves that
comment for Japanese etc.).

> > Besides, a few ones-complement hosts with C compilers are still in use
> > (Unisys mainframes)
> Let's hope that they get out of business soon :-)

That patent will expire in June (and I don't think they'll go out of
business before then :-), so this issue should become moot.  If it's
any consolation, I think their attempts to enforce the patent have
cost their bottom line.

> > Anyway, if it's easy, it's better to avoid code that assumes two's
> > complement, since such code is a bit trickier to read
> On the contrary, such code is good teaching material for bit
> operations. Did you know that for every x
>      ((x - 1) & (- x - 1)) + 1 == x & -x 

No, I didn't know that.  But don't you have to parenthesize the C code

       ((x - 1) & (- x - 1)) + 1 == (x & -x)

My own favorite property in this area is much simpler.  I learned it
by attending a talk by Edsger W. Dijkstra, where he spent way too much
time on it since he thought it was neat too.  The property is that
"==" is associative on booleans.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]