Re: coding tags and utf-16

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From:	Kenichi Handa
Subject:	Re: coding tags and utf-16
Date:	Tue, 07 Mar 2006 10:02:05 +0900
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

In article <address@hidden>, Benjamin Riefenstahl <address@hidden> writes:

> Kenichi Handa writes:
>> For decoding UTF-8, we should not delete that BOM but treat it as
>> the content of the text.  For UTF-16, Unicode explicitly says that
>> "The BOM is not considered part of the content of the text", but for
>> UTF-8, it doesn't say such a thing.

> NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing
> UTF-8 files.  When I saw that and tried to discuss it on their
> newsgroups, I learned that it seems to be Microsoft's POV that this is
> a good thing.

> Which means files like that exist.  Treating the BOM as content means
> that U+FEFF creeps into the regular content of documents through
> cut-and-paste and through components of template systems.  I have
> already seen that happening in real life and of course it leads to
> stupid bugs.  I think Emacs should do better.

But, it's simply a bug to delete the leading U+FEFF from the
content while decoding utf-8.  Perhaps we should add some
customizable flag to control that behavior after the
release.

>> utf-16-be [==] utf-16be-with-signature [!=] utf-16be

> ;-)

^.^;;;

---
Kenichi Handa
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: coding tags and utf-16, Kenichi Handa, 2006/03/02
- Re: coding tags and utf-16, Benjamin Riefenstahl, 2006/03/04
  - Re: coding tags and utf-16, Kenichi Handa, 2006/03/06
    - Re: coding tags and utf-16, Benjamin Riefenstahl, 2006/03/06
    - Re: coding tags and utf-16, Kenichi Handa <=
  - Re: coding tags and utf-16, Tomas Zerolo, 2006/03/08
- Re: coding tags and utf-16, Kenichi Handa, 2006/03/15

Prev by Date: Re: key to yank text at point into minibuffer?
Next by Date: Re: MH-E manual update
Previous by thread: Re: coding tags and utf-16
Next by thread: Re: coding tags and utf-16
Index(es):
- Date
- Thread