[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support (was: null-device)

From: Eli Zaretskii
Subject: Re: Unicode support (was: null-device)
Date: Wed, 18 Jul 2001 09:41:28 +0300 (IDT)

> [Not sure what's the right forum to discuss this issue.]

The right forum is address@hidden; I've redirected the followups.

> Eli Zaretskii <address@hidden> writes:
> > Volunteers are welcome to come on board of Emacs development and work on
> > switching to Unicode as the internal representation of characters.
> What does this mean "Unicode as the internal representation of
> characters"?  Dropping Mule in favor of Unicode/UTF-8?

``Dropping Mule'' is ambiguous, because Mule means different features
to different people.  Let me explain a bit more.

Currently, the internal representation of characters within Emacs is
based on the principle that each charset is separate and disjoint with
other charsets.  Thus, é in Latin-1 and the same character in
Latin-3 are distinct characters, as far as Emacs is concerned.  They
are represented by two different integers inside a buffer, and if you
try to save a buffer with Latin-3 text using ISO-8859-1 encoding,
Emacs will say it's unable to do so, even if all the non-ASCII
characters are from the subset of Latin-3 that is in the intersection
of Latin-1 and Latin-3.

You cannot support Unicode with this representation, because Unicode
unifies characters by its very design principle.

``Switching to Unicode'' means that the internal representation of
characters--the integers you find in buffers and strings that stand
for the characters--is changed to be based on Unicode codepoints.
They are not _exactly_ Unicode codepoints, at least do not _have_ to
be identical, because Emacs still needs to support other character
sets for those few cultures which oppose unification, but they should
be some simple transformation of Unicode.

(I suggest that we don't speak about UTF-8 in the context of this
discussion because UTF-8 is an encoding: it's a way of transmitting
Unicode text with 8-bit bytes.  It's not a character set.)

> > I attach below a few words which I hope will make it clear that
> > without motivated individuals that will begin working on this RSN,
> > Unicode support in Emacs will remain a pipe dream for a long, long
> > time.
> There are motivated individuals (e.g. Eric Naggum -- at least, he
> was...).

Erik Naggum, and a few other people who actively participated in the
discussion of the Unicode-based design I mentioned, disappeared from
sight as soon as the design was agreed upon.  I don't know why did
that happen, but the fact is that no one coded anything according to
that agreed-upon design, except Handa-san lately.

> > So if you really care about Unicode support in Emacs, please consider
> > working on some of the required infrastructure.
> To attract hackers working on UTF-8 for Emacs Mule has to go away first.

What do you mean by ``first''?  We need to replace the current
representation by another, based on Unicode.  When the new
representation is in place, the old one will definitely not be in
Emacs, because you can't have two conflicting representations.

But if you mean that we should remove Mule, release a version of Emacs
without any non-ASCII support except for unibyte locales, and then,
several releases later, add a Unicode-based non-ASCII support, then
this isn't going to happen.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]