[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: coding tags and utf-16
From: |
Werner LEMBERG |
Subject: |
Re: coding tags and utf-16 |
Date: |
Wed, 04 Jan 2006 15:58:21 +0100 (CET) |
> > There is a serious problem with coding tags and utf-16 encodings
> > of any flavour: Emacs simply can't recognize the tag. This is a
> > non-trivial problem.
>
> Sorry for the late reply, but I think coding tag is useless for a
> file encoded in some of utf-16 variants.
>
> If a file has BOM at the head, BOM should tell the exact encoding
> whatever is specified in coding tag.
>
> If a file is encoded without BOM, we must use the less reliable
> heuristics to guess utf-16be or utf-16le. If you find a coding-tag
> spec by ignoring all zero bytes at even byte indexes, it means that
> the file is, in high possibility, utf-16be whatever the tag value
> is. If you find a coding-tag spec by ignoring all zero bytes at odd
> byte indexes, it means that the file is utf-16le whatever the tag
> value is.
>
> So, in any cases, a tag value itself is useless. [...]
I'll do the following for groff's preprocessor, preconv:
. If the data starts with a BOM, use it, and ignore the coding tag.
. Otherwise, if there are zero bytes in the first two lines, ignore
those zero values, emit a warning, and use the coding tag, if any.
. Otherwise, use the default encoding -- this normally will lead to
a wrong result and make groff explode, but I consider this better
than to apply heuristics, especially if you have to recognize both
UTF16 and UTF32 variants. This is probably a suboptimal solution
but quite easy to implement, and the user can always explicitly
select an encoding on the command line. Perhaps someone finds
(and implements) a better way which I can then adapt to preconv.
Werner
- Re: coding tags and utf-16, Kenichi Handa, 2006/01/04
- Re: coding tags and utf-16,
Werner LEMBERG <=
- Re: coding tags and utf-16, Richard M. Stallman, 2006/01/04
- Re: coding tags and utf-16, Kenichi Handa, 2006/01/04
- Re: coding tags and utf-16, Richard M. Stallman, 2006/01/05
- Re: coding tags and utf-16, Werner LEMBERG, 2006/01/05
- Re: coding tags and utf-16, Kenichi Handa, 2006/01/06
- Re: coding tags and utf-16, Richard M. Stallman, 2006/01/06
- Re: coding tags and utf-16, Kenichi Handa, 2006/01/07
Re: coding tags and utf-16, Stefan Monnier, 2006/01/05