gnu-emacs-sources
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

various Mule stuff, especially Cyrillic & Unicode improvements


From: Dave Love
Subject: various Mule stuff, especially Cyrillic & Unicode improvements
Date: 16 Apr 2002 19:55:25 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1.95

I've dumped a load of Mule-related stuff I've done that may be of
interest under <URL:ftp://dlpx1.dl.ac.uk/fx/emacs/Mule> (all for Emacs
21).  It partly updates things I've posted before.

Much of this depends on the translation tables in ucs-tables.el.  It's
useful on its own to provide global minor modes for unifying input to
Unicode (not recommended generally) and `unifying' on encoding so that
coding systems can encode a set of unicodes regardless of their
iso-2022 charset (which Western users definitely want).  This is
mostly done for iso-8859, but could be extended.  [`Unify' isn't
necessarily an accurate term, but it's what people recognize.]

The extra language support depends on the many extra coding systems
defined by code-pages.el.  There's a patch which would allow them to
be autoloaded.

There's substantially improved coverage of language environments,
coding systems and input methods.  Cyrillic is reworked, in
particular.  It's not clear whether it's best to base Cyrillic on
8859-5 or the 8859-5/Unicode mixture that's now there, but there's
support for unifying 8859 to Unicode and fragmenting Cyrillic (and
Greek) characters to more space-efficient 8859-5.  Feedback welcome.

Quail hooks are provided so that the actual encoding of Quail input
methods normally should be irrelevant for characters covered by
ucs-tables -- you can use latin-1-postfix to search for text in a
Latin-9 buffer.

The utf-8 coding system is much extended (and fixed for invalid
utf-8).  With a patch for the CCL interpreter applied, you could have
utf-8 and utf-16 support at least to the level of Mule-UCS,
i.e. including CJK and level 3 for supported scripts, but the CJK
support needs some simple tidying up by someone interested.

There's no documentation outside the files' commentaries and the doc
strings, but the intent is that it mainly just DTRT when installed.
There are Custom variables for the things you might want to set up or
change.  Note that some of the files replace preloaded ones, so you
might want to dump a new Emacs to use them.

Here's the README for what's there.  Hope it's useful.

These are various Mule-related files, some modified from the Emacs
21.1 sources for various purposes, others new.  They're mostly whole
source files (which I've been using to some extent) rather than diffs
which are more of a pain to manage.  Some of this has made it into the
Emacs development sources, some was rejected, some is new.  It's
probably all Emacs 21-specific.

 * lisp/international/characters.el: Extended, particularly for unicodes.

 * lisp/international/ucs-tables.el: Translation between Unicode and
   other Mule charsets, providing `unification' of European charsets
   (`unify-8859-on-decoding-mode' and `unify-8859-on-encoding-mode').
   New command `ucs-insert'.  Extended `decode-char', `encode-char'.
   Hook into Quail to translate input method characters conformant
   with buffer file coding system.  Extended `ccl-encode-unicode-font'
   to be able to display characters in characters in the tables with
   an iso10646 font (via `set-fontset-font').

 * ccl.diff: Patches to add hash table lookup to CCL.

 * utf-16.el: New file for utf-16 coding systems.

 * lisp/international/utf-8-subst.el: New file for use by utf-8.el
   with CJK.

 * lisp/international/utf-8.el changes:

   * Fixes for behaviour with invalid utf-8 input;

   * Encoding more characters using ucs-tables.el (see
     ucs-mule-to-mule-unicode translation table);

   * Translation on decoding, e.g. fragmentation of Cyrillic and Greek
     from mule-unicode (see utf-8-fragment-on-decoding,
     utf-8-translation-table-for-decode);

   * Decoding CJK using utf-8-subst.el, similarly to Mule-UCS, if
     ccl.diff has been applied.  Needs some more work -- see fixme
     comment;

   * Without ccl.diff applied, display CJK sequences by composition
     using utf-8-subst.el;

   * On decoding, compose other valid, but untranslatable, utf-8
     sequences with help on the unicode;

   * Optional level 3 support for diacritics, Thai, Lao, Devanagari.

 * lisp/international/code-pages.el: New file with many extra
   Unicode-based 8-bit coding systems and a macro to build them.
   Should replace codepage.el.

 * lisp/international/mule-diag.el: Various changes to provide more
   information, e.g. about Unicodes, and support for code-pages.el.

 * lisp/international/latin1-disp.el: Extended, especially for Unicode.

 * lisp/international/mule-cmds.el: Mainly extended/corrected
   processing of locale specifications.

 * lisp/international/quail.el: Mainly provide translation of input
   through `standard-translation-table-for-input' so method coding can
   be made to conform to buffer's.

 * lisp/language/cyrillic.el: Reworked.  Complete and correct koi8-r
   and alternativj.  Add koi8-u.  Add unification between ISO 8859-5
   and Unicode (including recoding for fonts).  Various new language
   modes (depending on ucs-tables.el, code-pages.el, Quail changes).

 * lisp/language/cyril-util.el: Fixed and extended.

 * lisp/language/georgian.el: New environment.

 * lisp/language/european.el: Extra environments.  Support for
   composition of diacritics.

 * lisp/language/utf-8-lang.el: utf-8 pseudo-language environment,
   auto file decoding for .utf.

 * leim/quail/cyrillic.el: Somewhat reworked and variously extended.

 * leim/quail/latin-ltx.el: Extended/corrected.

 * leim/quail/{sgml-input,uni-input,rfc1345,georgian, welsh}.el: New
   input methods.

 * lisp/language/{lao,lao-util,thai,thai-util}.el: Support for Unicode
   composition.

 * trans.el: Multibyte character translation and transcoding.

 * ucp.el: Finding un-encodable characters.  Potentially-useful for
   finding encoding boundaries, e.g. multiple charset regions in Gnus
   MIME encoding, or simply finding an errant character with things
   like `select-safe-coding-system'.

 * autoload-coding-systems.diff: Patch to support autoloading coding
   systems.

Dave Love <address@hidden>  2002-04-16

Currently at ftp://dlpx1.dl.ac.uk/fx/emacs/Mule

reply via email to

[Prev in Thread] Current Thread [Next in Thread]