bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: accents


From: Andreas Schwab
Subject: Re: accents
Date: Mon, 16 May 2011 00:38:47 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)

Chet Ramey <chet.ramey@case.edu> writes:

> On 5/10/11 9:17 AM, Greg Wooledge wrote:
>
>> In yours, however, it is 0x65 0xcc 0x81 which is U+0065 LATIN SMALL
>> LETTER E followed by U+0301 COMBINING ACUTE ACCENT.
>
> That's not valid UTF-8, since UTF-8 requires that the shortest sequence
> be used to encode a character.

0x65 0xcc 0x81 is the correct UTF-8 encoding for the two character
sequence U+0065 U+0301.

> The general problem with combining
> characters still exists (the one in the message I referenced in an
> earlier reply), but this case has more to do with Mac OS X and its use
> of both precomposed and decomposed UTF-8 than anything.

There is no such thing as "precomposed UTF-8" and "decomposed UTF-8".
UTF-8 is an encoding of Unicode, and both NFD and NFC are valid forms of
Unicode.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



reply via email to

[Prev in Thread] Current Thread [Next in Thread]