[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future

From: Eli Zaretskii
Subject: Re: Emacs Lisp's future
Date: Sat, 27 Sep 2014 12:32:56 +0300

> From: "Stephen J. Turnbull" <address@hidden>
> Date: Sat, 27 Sep 2014 17:35:12 +0900
> Cc: Dmitry Antipov <address@hidden>, address@hidden, address@hidden
> Eli Zaretskii writes:
>  > > Date: Fri, 26 Sep 2014 18:45:54 +0400
>  > > From: Dmitry Antipov <address@hidden>
>  > > Cc: address@hidden
>  > > 
>  > > Why not just use ICU?
>  > 
>  > Emacs needs to be able to extend the Unicode code-point space for raw
>  > 8-bit bytes and for a couple of character sets that are not unified.
> No, you don't.  There's plenty of private space for those purposes
> (unless you know of private character sets that use more than two
> whole planes?)

I take it that you have studied the charsets for which we use
codepoints above 0x10FFFF, and concluded that they all fit in the
2*64K+6.4K PUA space provided by Unicode?  We have several quite large
character sets which need that (grep mule-conf.el for ":unify-map" to
see the list, and see etc/charsets/ for the map files).  I'm not sure
the PUA space is large enough, but I didn't sum all the numbers.

In any case, the question why we don't use PUA for this is best
addressed to Handa-san (CC'ed).

> Emacs would simply use an indirect representation for
> private space.  (That is, code points in private space are not
> necessarily identical to the input code points, but rather are indexes
> into an auxiliary table which implements the disjoint sum of the
> private code spaces in use.)

IIUC, this is a non-trivial complication.  Currently, our mapping is
set up so that we can keep the non-unified characters in our buffers,
while you propose indirection via tables.  This means, for example,
that direct access to char-tables will become slower.

> Since this is private space, you need to build a table of attributes
> for these characters (I/O representation, UCD properties, glyphs, etc)
> anyway.  For Unicode input using private space, you just record that
> as the I/O representation.

Yes, and the question is how well does ICU support setting up these.
I don't know the answer to that.

It is also not clear to me whether what you suggest will support the
internal representation of raw bytes and their conversion to and from
their external (a.k.a. "encoded") 8-bit values.

In any case, I agree that using ICU in Guile would be a huge step
forward, because currently they simply rely on the underlying libc,
which is only a more-or-less safe bet when libc is glibc; if not, the
results fall very short of what the user needs and Emacs expects.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]