Re: coding tags and utf-16

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From:	Werner LEMBERG
Subject:	Re: coding tags and utf-16
Date:	Wed, 04 Jan 2006 15:58:21 +0100 (CET)

> > There is a serious problem with coding tags and utf-16 encodings
> > of any flavour: Emacs simply can't recognize the tag.  This is a
> > non-trivial problem.
> 
> Sorry for the late reply, but I think coding tag is useless for a
> file encoded in some of utf-16 variants.
> 
> If a file has BOM at the head, BOM should tell the exact encoding
> whatever is specified in coding tag.
> 
> If a file is encoded without BOM, we must use the less reliable
> heuristics to guess utf-16be or utf-16le.  If you find a coding-tag
> spec by ignoring all zero bytes at even byte indexes, it means that
> the file is, in high possibility, utf-16be whatever the tag value
> is.  If you find a coding-tag spec by ignoring all zero bytes at odd
> byte indexes, it means that the file is utf-16le whatever the tag
> value is.
> 
> So, in any cases, a tag value itself is useless.  [...]

I'll do the following for groff's preprocessor, preconv:

  . If the data starts with a BOM, use it, and ignore the coding tag.

  . Otherwise, if there are zero bytes in the first two lines, ignore
    those zero values, emit a warning, and use the coding tag, if any.

  . Otherwise, use the default encoding -- this normally will lead to
    a wrong result and make groff explode, but I consider this better
    than to apply heuristics, especially if you have to recognize both
    UTF16 and UTF32 variants.  This is probably a suboptimal solution
    but quite easy to implement, and the user can always explicitly
    select an encoding on the command line.  Perhaps someone finds
    (and implements) a better way which I can then adapt to preconv.


      Werner

[Prev in Thread]

Current Thread

[Next in Thread]

Re: coding tags and utf-16, Kenichi Handa, 2006/01/04
- Re: coding tags and utf-16, Werner LEMBERG <=
- Re: coding tags and utf-16, Richard M. Stallman, 2006/01/04
  - Re: coding tags and utf-16, Kenichi Handa, 2006/01/04
    - Re: coding tags and utf-16, David Kastrup, 2006/01/05
    - Re: coding tags and utf-16, Andreas Schwab, 2006/01/05
    - Re: coding tags and utf-16, Richard M. Stallman, 2006/01/05
    - Re: coding tags and utf-16, Werner LEMBERG, 2006/01/05
    - Re: coding tags and utf-16, Kenichi Handa, 2006/01/06
    - Re: coding tags and utf-16, Richard M. Stallman, 2006/01/06
    - Re: coding tags and utf-16, Kenichi Handa, 2006/01/07
- Re: coding tags and utf-16, Stefan Monnier, 2006/01/05

Prev by Date: Re: cvs-update
Next by Date: Re: Bootstrap failed during compile of process.c
Previous by thread: Re: coding tags and utf-16
Next by thread: Re: coding tags and utf-16
Index(es):
- Date
- Thread