[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regexps and locales
From: |
Olivier Wittenberg |
Subject: |
Re: regexps and locales |
Date: |
Fri, 4 Feb 2005 23:41:52 +0100 |
User-agent: |
Mutt/1.4.2.1i |
On Tue, Feb 01, 2005 at 12:33:11PM -0500, Chet Ramey wrote:
> Isn't a one-byte-long filename still a valid filename?
Yes.
> As far as I know, as long as you can get a one-byte filename
> created, readdir will return it, and `?' should match it.
readdir will return it, but if I understand POSIX correctly, '?'
should not match it.
> I don't think Posix says that; it says that `?' matches `a character'. If
> you have a filename returned by readdir, and you're in a multibyte locale,
> `?' will match a character, wide or not.
Here's how POSIX defines the word 'character' (in Base Definitions):
3.87 Character
A sequence of one or more bytes representing a single graphic symbol
or control code.
Note:
This term corresponds to the ISO C standard term multi-byte
character, where a single-byte character is a special case of a
multi-byte character. Unlike the usage in the ISO C standard,
character here has no necessary relationship with storage space,
and byte is used when storage space is discussed.
According to this definition, I understand that "\\351", for example,
is a byte, but is not a character in a UTF-8 locale, hence should not
be matched by '?'.
That's how I interpret the POSIX specification, though I'd certainly
prefer to be proven wrong.
Best,
--Olivier