bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c


From: Paul Eggert
Subject: Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c
Date: 28 Jan 2003 10:40:23 -0800
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.3

Bruno Haible <address@hidden> writes:

> Even if the C locale of a system actually uses Latin-1 (such systems
> are e.g. AmigaOS), the functions here will use ASCII.

True, but my question is what the symbol C_CTYPE_ASCII means.  That
is, I am trying to understand the implementation, not trying to
understand the API.

>From your remarks, apparently you mean for C_CTYPE_ASCII to mean "the
character set is upward compatible with JIS X 0201:1997 left half
(Japanese JIS Roman)".  (Most people who use that character set know
it by its JIS name, not its ISO name.)  A problem with this
interpretation, though, is that you're testing for characters that are
not in that character set.  That character set doesn't have '\' or
'~', yet the C_CTYPE_ASCII #if tests for those two characters.
Conversely, the #if doesn't test for '$' or '@', even those two
characters are in JIS Roman and your remarks suggest that you intended
to test for '$' and '@'.

Perhaps something like this would be a better substitute?

   /* The character set is close enough to ASCII (ISO 646 IRV:1991)
      that all the c_ctype macros and functions should operate
      correctly.  For example, JIS X 0201:1997 left half (Japanese JIS
      Roman, ISO 646-JP) is ASCII-like, even though it substitutes
      other symbols for '\' and '~'.  The #if above does not test for
      '$' or '@', because they are not part of the portable C
      character set; assume their codes are ASCII-like as well.  */
   #define C_CTYPE_ASCII_LIKE 1

and then use "C_CTYPE_ASCII_LIKE" in the code.

> > > #define c_isascii(c) \
> > >   ({ int __c = (c); \
> > >      ((__c & ~0x7f) == 0); \
> > >    })
> > 
> > This isn't correct for a signed-magnitude host
> 
> Such hosts don't exist any more for more than 20 years.

Would you be convinced by an efficiency argument?
On my host (GCC 2.95.3 with -O2, sparc), the unportable code:

  int f (int x) { return (x & ~0x7f) == 0; }

requires 4 instructions, but the portable code:

  int g (unsigned x) { return x <= 0x7f; }

requires only 3.  This is just one platform, but typically I expect a
single comparison to be faster than a mask followed by a test.  So the
more-portable version is also faster, in common practice.

Besides, a few ones-complement hosts with C compilers are still in use
(Unisys mainframes), and the code doesn't work properly on those hosts
either, since it mishandles c_isascii (-0).

Anyway, if it's easy, it's better to avoid code that assumes two's
complement, since such code is a bit trickier to read (and it may
elicit bug reports from pedants :-).


> For debugging it is best to use -O0, and in this case "c-ctype.h"
> will use the external functions, not the macros.

But that's two copies of the code, which have to maintained
separately.  With inline functions you have one less copy of the code,
so it should be less error-prone.


> how would I denote e.g. #\End-Of-Transmission in a
> portable way, so that it evaluates to 0x04 on ASCII hosts and 0x37 on
> EBCDIC hosts?

This is a minor nit, but you should be able to detect what
kind of EBCDIC host you're on by using the portable character set
and/or user-specified flags (preferably settable at run time), and use
that to decide which translation table to use.  Admittedly this is not
for the queasy, and is perhaps better deferred for an EBCDIC expert,
but if you're actually worried about EBCDIC support (as opposed to
just having a hook for EBCDIC support later), that is the way to go.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]