[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnulib] Re: strtok_r
From: |
Bruno Haible |
Subject: |
Re: [Bug-gnulib] Re: strtok_r |
Date: |
Fri, 12 Nov 2004 14:28:13 +0100 |
User-agent: |
KMail/1.5 |
Simon Josefsson wrote:
> I'll install this in gnulib now.
>
> /* Parse S into tokens separated by characters in DELIM.
> If S is NULL, the saved pointer in SAVE_PTR is used as
> the next starting point. For example:
> char s[] = "-abc-=-def";
> char *sp;
> x = strtok_r(s, "-", &sp); // x = "abc", sp = "=-def"
> x = strtok_r(NULL, "-=", &sp); // x = "def", sp = NULL
> x = strtok_r(NULL, "=", &sp); // x = NULL
> // s = "abc\0-def\0"
>
> For the POSIX documentation for this function, see:
> http://www.opengroup.org/onlinepubs/009695399/functions/strtok.html
>
> Caveat: It modifies the original string.
> Caveat: These functions cannot be used on constant strings.
> Caveat: The identity of the delimiting character is lost.
> Caveat: It doesn't work with multibyte strings unless all of the
> delimiter characters are ASCII characters < 0x80.
>
> See also strsep().
> */
Yes, this looks good. Except the 0x80 should really be 0x30. Most multibyte
encodings have the property that an ASCII character is encoded as a single
byte, with the same value as in ASCII. But here, in order to use, say, '0'
or 'A' as a delimiter, you need a different property: That every occurrence
of a byte with a given ASCII value means that ASCII character and is not
part of a multibyte character. This property is fulfilled for UTF-8 and the
EUC-*. Unfortunately, the following widely used encodings don't have this
property:
BIG5 BIG5-HKSCS GBK SHIFT_JIS
don't have the property for 0x40 <= c <= 0x7E
GB18030 doesn't have the property for 0x30 <= c <= 0x39, 0x40 <= c <= 0x7E
JOHAB doesn't have the property for 0x31 <= c <= 0x7E
Especially GB18030 is probably bound to stay around for a long time.
Therefore really 0x30 is the limit of the usable delimiters.
Bruno
Re: [Bug-gnulib] strtok_r, Paul Eggert, 2004/11/11