[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future

From: David Kastrup
Subject: Re: Emacs Lisp's future
Date: Sat, 27 Sep 2014 10:49:37 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

"Stephen J. Turnbull" <address@hidden> writes:

> Eli Zaretskii writes:
>  > > Date: Fri, 26 Sep 2014 18:45:54 +0400
>  > > From: Dmitry Antipov <address@hidden>
>  > > Cc: address@hidden
>  > > 
>  > > Why not just use ICU?
>  > 
>  > Emacs needs to be able to extend the Unicode code-point space for raw
>  > 8-bit bytes and for a couple of character sets that are not unified.
> No, you don't.  There's plenty of private space for those purposes
> (unless you know of private character sets that use more than two
> whole planes?)  Emacs would simply use an indirect representation for
> private space.  (That is, code points in private space are not
> necessarily identical to the input code points, but rather are indexes
> into an auxiliary table which implements the disjoint sum of the
> private code spaces in use.)
> Since this is private space, you need to build a table of attributes
> for these characters (I/O representation, UCD properties, glyphs, etc)
> anyway.  For Unicode input using private space, you just record that
> as the I/O representation.
>  > Can ICU support that?
> Maybe it would be unhappy if you used a lone surrogate representation
> (or other representation using integers outside of the Unicode
> character space) for those "extended code points", but as proposed
> above you can efficiently use private space in practice.

Except that Emacs, as an editor, needs to support the private spaces
users might want to use.  Hijacking the surrogates is a reasonable
compromise.  Another would have been hijacking the 4-byte encodable code
space beyond Unicode character 1114111 that is outside of UTF-8 but
inside of the coding scheme's logic and thus working equally well for
string manipulations: however, that would cause unencodable bytes to
take up more space.  I think LuaTeX may use that strategy.

Being an editor, Emacs has to be more circumspect than most other
encoding-sensitive applications about what it may work with since
everything that is "private" may well be within the range that a user
wants to be able to put into string literals.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]