emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: undecided vs utf-8


From: Eli Zaretskii
Subject: Re: undecided vs utf-8
Date: Fri, 05 Nov 2010 10:09:14 +0200

> From: Lars Magne Ingebrigtsen <address@hidden>
> Date: Fri, 05 Nov 2010 03:32:02 +0100
> 
> Kenichi Handa <address@hidden> writes:
> 
> > It's perhaps because you are in some of iso-8859-1 locale.
> 
> I don't think I am, but I might be wrong.  There are so many locale
> variables, but I always try to put my machines into "C" locale.

"M-x mule-diag RET" will show.

> I don't know how the big5 encoding looks like, but when it comes to
> iso-8859-1 vs utf-8, then there are many utf-8 strings that are valid
> iso-8859-1 strings, but there are few iso-8859-1 strings that are valid
> utf-8 strings.  Therefore it seems to make sense to prefer utf-8 over
> iso-8859-1.  Perhaps.

It will replace one non-perfect heuristics with another.  Each one of
them fails sometimes, and when you hit that one use-case, it doesn't
comfort you whether you are in the 0.5% of losers or in 0.1%.

This is one use-case.  Let's investigate it thoroughly before we
ponder the possibility of changing global defaults for everyone.

> Well, this is about `undecided', and the C layer does DWIM-ish
> processing when you ask it to decode `undecided', doesn't it?

No.  There's no DWIM-like behavior in how Emacs guesses under
`undecided'.  It goes by the priority list and uses the first encoding
that can decode all the characters in the input text.  This process is
completely driven by the priority list, it does not consider anything
else.  The DWIM parts are those which set the priority list given your
preferences and the locale.

> The use case that made me look into this -- erc -- is somewhat special.
> The irc protocol does no charset tagging, and some clients send some
> charsets, and some send others, which is why erc uses `undecided' as the
> default coding system.  Typically on a channel you'll see somebody using
> a local (iso-8859-* is popular) charset, and others using utf-8.

This means that even if we do want to change the priority list, it
should only be done for erc.  The global defaults do not need to be
touched.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]