bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new module c-strstr


From: Paul Eggert
Subject: Re: new module c-strstr
Date: Fri, 18 Aug 2006 16:47:28 -0700
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)

Bruno Haible <address@hidden> writes:

> Therefore most of our "c-*" modules should better be called
> "ascii-*" or "unibyte-*".

But both ASCII and other unibyte locales might say that some bytes are
encoding errors.  So none of these names are exactly right.  I guess
c-* is as good a name as any.

>> I think this claim isn't true for some weird non-ASCII encoding
>> schemes like DBCS-Host.
>
> Are these used as locale encodings? Many of these so-called DBCS encodings
> are stateful and therefore not usable as locale encodings.

Some are stateful, some not.  As I understand it, the former are more
common, but I have practical experience only with the latter.  They
are used as locale encodings in C environments.  I'd expect Cobol to
be similar but don't know about it.

> Non-nearly-ASCII-compatible encodings don't appear in the world where GNU
> programs are deployed.

This is true for GNU programs that deal with encodings.  My guess is
that most people who use GNU software use --disable-nls and the like
when they run in non-ASCII environments, and don't bother to file bug
reports because they don't expect much help from us.  That being said,
GNU make and GCC are used on OS/390, as well as Python and Perl.
People have ported other GNU tools like M4.  (Admittedly it is an
uphill battle...)

> But it's important to know that   c_strstr (s, "x")  is not safe and
> c_strstr (s, "123")  is also not safe. The programmer needs to have the
> precise criteria.

I don't quite follow this. c_strstr (S, "x") is safe in all cases; it
never has undefined behavior.  It's true that the result might not
be the same as strstr (S, "x"), but that's the point of having
c_strstr, right?  So I would change this:

> /* The functions defined in this file assume a nearly ASCII compatible
>    character set.  */

to

/* The functions defined in this file act on null-terminated byte
   strings, without regard to locale.  */

and this:

>    This function is safe to be called, even in a multibyte locale, if NEEDLE
>    ...

to this:

>    This function is safe to be called, even in all known multibyte locales
>    derived from ASCII, if NEEDLE ...




reply via email to

[Prev in Thread] Current Thread [Next in Thread]