[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: One more string functions change

From: Yuri Khan
Subject: Re: One more string functions change
Date: Sun, 29 Jun 2014 00:26:19 +0700

On Sat, Jun 28, 2014 at 11:21 PM, Dmitry Antipov <address@hidden> wrote:

> What's wrong with case tables? If we're talking about Unicode only,
> is it enough/possible/desirable to have just one (huge) case table
> for all supported characters?

It’s not generally possible, because in Turkic locales there is this
funny couple of letters, i and dotless ı. They uppercase into dotted İ
and I, respectively. This makes uppercase a function dependent on the

Further, comparing strings case-insensitively by downcasing is wrong,
because of this funny German letter ß (sharp s, eszett), and these
funny Greek letters σ (sigma) and ς (final sigma). Straße is
case-insensitively equivalent to STRASSE, but they downcase to straße
and strasse, respectively. Both sigma σ and final sigma ς are
case-insensitively equivalent to Capital Sigma Σ, but small letters
downcase to themselves and Capital Sigma downcases to σ.

The right, Unicode-compliant way to compare strings case-insensitively
involves a mapping called case folding, which is similar to
downcasing, but subtly different. For example, it expands ß into ss,
and normalizes final sigma to normal sigma, and does many other
expansions. Case-folded strings are largely not usable for human
consumption but only for case-insensitive comparison. Details can be
found in the Unicode Standard, section 5.18.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]