[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 23.0.60; [nxml] BOM and utf-8
From: |
tomas |
Subject: |
Re: 23.0.60; [nxml] BOM and utf-8 |
Date: |
Thu, 22 May 2008 06:17:45 +0200 |
User-agent: |
Mutt/1.5.15+20070412 (2007-04-11) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thu, May 22, 2008 at 12:37:11AM +0200, Patrick Drechsler wrote:
> Patrick Drechsler <address@hidden> writes:
This would be rather a question to w3.org, but...
> > ,----[ http://www.w3.org/TR/2006/REC-xml-20060816/#charencoding ]
> > | Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY
> > | begin with the Byte Order Mark [...]
> > | [...] XML processors MUST be able to use this character to
> > | differentiate between UTF-8 and UTF-16 encoded documents.
> > `----
...and how are the XML processors supposed to achieve that? Is there a
second variant of BOM, indicating UTF-8?
> > and
> >
> > ,----[
> > http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-with-ext-info ]
> > | If an XML entity is in a file, the Byte-Order Mark and encoding
> > | declaration are used (if present) to determine the character encoding.
> > `----
...or is rather the absence of a BOM the indicator for UTF-8?
Am I completely whacko, or are they?
Sorry. I am confused.
Ah, and BTW: interpreting the BOM as whitespace is not that far off --
as stated in <http://unicode.org/faq/utf_bom.html#38>:
| Q: What should I do with U+FEFF in the middle of a file?
|
| A: In the absence of a protocol supporting its use as a BOM and when not
| at the beginning of a text stream, U+FEFF should normally not occur. For
| backwards compatibility it should be treated as ZERO WIDTH NON-BREAKING
| SPACE (ZWNBSP), and is then part of the content of the file or string.
[...]
But that would be "in the middle of a file", not at the beginning, as
our case is.
I'd appreciate any insights.
Thanks
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFINPPpBcgs9XrR2kYRAutgAJ9BXb32mnDV53T3RTOBu4LGmOfHIgCfUxNG
EJYtPO908ac75bw1vERvRyQ=
=IQaH
-----END PGP SIGNATURE-----
- 23.0.60; [nxml] BOM and utf-8, Patrick Drechsler, 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8, Lennart Borgman (gmail), 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8, Mark A. Hershberger, 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8,
tomas <=
- Re: 23.0.60; [nxml] BOM and utf-8, Miles Bader, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, Jason Rumney, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/27
- Re: 23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/23
- Re: 23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/23
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/27
23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/17