[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cyrillic vs UTF-8

From: Simon Josefsson
Subject: Re: Cyrillic vs UTF-8
Date: Fri, 25 Apr 2003 19:09:07 +0200
User-agent: Gnus/5.090019 (Oort Gnus v0.19) Emacs/21.3.50 (gnu/linux)

"Eli Zaretskii" <address@hidden> writes:

>> From: Simon Josefsson <address@hidden>
>> Date: Fri, 25 Apr 2003 18:12:17 +0200
>> I think there are two problems.  Opening the file the first time
>> should guess it is a utf-8 file.
> IIRC, you need to make the priority of utf-8 higher for this to
> happen.  Unless that's changed in the current CVS, try evaluating the
> following expression:
>   (prefer-coding-system 'utf-8)
> before you visit a utf-8 encoded file, and see if that helps.  I think
> this is because the encoding detection routines cannot distinguish
> between Latin-n and utf encoding without some help.

This works, but note that Emacs didn't recognize the file as being in
any encoding without it.  The modeline says '-:--'.

It seems binary is preferred over utf-8 and utf-16-* in
coding-category-list.  This seems extremely conservative.  I guess it
means UTF-8 can never be autodetected by default?  Is the unicode
support so bad it shouldn't even be preferred over binary?  UTF-8 is
well formed and restricted; detecting it properly (even compared to
Latin-n) can be done well enough that failures rarely happen in

Can't we move binary down below UTF-8 in CVS?  IMHO we should move
UTF-8 earlier still, since determining whether data is UTF-8 or not
can be done with good probability.  Prefering binary over UTF-8 seems
just wrong.

There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
say, but it has been removed both in 21.3 and in CVS.  I thought that
meant UTF-8 was better supported now, but this doesn't seem to be the

reply via email to

[Prev in Thread] Current Thread [Next in Thread]