[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support (was: null-device)

From: Dave Love
Subject: Re: Unicode support (was: null-device)
Date: 22 Jul 2001 18:31:25 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.0

>>>>> "EZ" == Eli Zaretskii <address@hidden> writes:

 EZ> and if you try to save a buffer with Latin-3 text using
 EZ> ISO-8859-1 encoding, Emacs will say it's unable to do so, even if
 EZ> all the non-ASCII characters are from the subset of Latin-3 that
 EZ> is in the intersection of Latin-1 and Latin-3.

The unification solution to this involves a few lines of code (which
I've shown elsewhere) plus easily-generated tables.  If you unify on
decoding, as ISO 2022 appears to suggest, the issue basically doesn't
arise anyway and even Emacs 20 has that facility.  [I know a
programmer _can_ break this, because it's Emacs.]  Otherwise, you
could actually expurgate the Latin-3 charset in favour of a trivial
CCL coding system.

 EZ> You cannot support Unicode with this representation, because
 EZ> Unicode unifies characters by its very design principle.

I don't accept this definition of ‘support Unicode’.  Although I've
been assured it doesn't or can't, I maintain my Emacs (without
Mule-UCS) supports Unicode because at least:

 • It groks utf-8 (auto-detected in a utf-8 locale or from cues like
   ‘charset=’ in the file);

 • It can edit normally in the part of the BMP I need – Western
   technical text, including maths – better than, say, Yudit.  It
   works under X and tty with or without a Unicode font;

 • In the rest of the BMP it can edit infelicitously (this could be
   improved) and display the CJK space covered by whichever three
   charsets I chose in a quick go;

 • It has several Unicode-based input methods;

 • As above, it can unify 8859 and others through Unicode during
   coding conversion.  (I don't normally turn all that on, because it
   would mung some of the implementation files I edit.);

 • It has (using Unicode tables) coding systems for all the charsets
   not in base Emacs which haible told me are relevant for GNU
   locales.  Their characters are unified by construction;

 • The MIME code DTRT, as (basically) does W3, for instance;

 • [It might DTRT with Unicode menu items under a suitable version of
   X, if that didn't get broken a while back].

If I can find the enthusiasm, I'll package what I've done if and when
Emacs 21 is released.

 >> To attract hackers working on UTF-8 for Emacs Mule has to go away
 >> first.

This is false by counter-examples, even for values of ‘utf-8’ equal to
‘Unicode’.  The issue in my experience is making progress after
they're attracted.

The propaganda that gives rise to this false claim comes from people
who either don't understand Mule and/or deliberately mislead about it
and the people who work on it.  I admit to being misled initially.

 EZ> What do you mean by ``first''?  We need to replace the current
 EZ> representation by another, based on Unicode.

It's not clear to me that I need this as a Unicode user, even if I was
serious about wider or deeper coverage.  I don't doubt handa has a
good rationale for the re-implementation, though.  Someone might like
to justify it with arguments beyond coping with 8859.

If necessary, I could build a non-standard Emacs now with a different
set of private charsets to cover the whole BMP properly.  That's
undesirable if I ever have to deal with code or data using the
replaced charsets, but presumably it could be declared official.
Anyway, that level of compatibility has to break sometime.

Otherwise, handa proposed extending the code space (apparently doable
quickly) to accomplish the same sort of result with minimal grief.

Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text.   ☺

reply via email to

[Prev in Thread] Current Thread [Next in Thread]