[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Patch for some Cyrillic languages

From: Anton Zinoviev
Subject: Patch for some Cyrillic languages
Date: Fri, 26 Sep 2003 19:02:40 +0300
User-agent: Mutt/1.3.28i


Attached you can find a patch that makes several changes to Emacs (I
didn't know how to broke it in smaller pieces).  My comments about it


Added Belarusian, Bulgarian, Mongolian, Serbian and Ukrainian


Very small documentation changes to bulgarian-phonetic and

New keyboard layouts: bashkir, kazakh, mongolian and cyrillic-prefix
(the purpose of the last is to support all Cyrillic languages in an
convenient way).


x-font-name-charset-alist: koi8-u and cp1251 fonts can be used also
for mule-unicode-0100-24ff characters.  This is because these Cyrillic
codesets are not fully covered by ISO 8859-5 or any other 8-bit
charset.  If some user requests with `-fn' some koi8-u or cp1251 font,
then he/she probable wants to use it for symbols that are not part of
ISO 8859-5.


locale-language-names: Added support for Bashkir, Kazakh, Mongolian.
Changed language environments for Macedonian and Serbian.



   * Changed the input method from cyrillic-yawerty to
     cyrillic-translit.  Reason: cyrillic-translit is language
     independent and can generate all letters from ISO 8859-5.

   * The sample-text contains greetings for some ISO 8859-5 languages
     (as this is done with latin-N language environments).  Similar
     change to the documentation of the language environment.

   * New language environments: Macedonian and Serbian.

cyrillic-unify-encoding: I suppose that the change I made is correct,
but I don't completely understand how cyrillic-unify-encoding works:

@@ -158,6 +192,7 @@
              (cond ((eq c ? ) ? )
                    ((eq c ?-) ?-)
                    ((eq c ?S) ?S)
+                   ((eq c ?#) ?#)
                    (t (decode-char 'ucs (+ #x400 i)))))
             (ec (aref table c))        ; encoding of 8859-5
             (uc (aref table u)))       ; encoding of Unicode


   * I haven't done any change to the generic language environment for
     KOI8-R.  However this encoding is used exclusively for Russian
     and given a specific language environment for Russian I think
     that the generic environment for KOI8-R is obsolete.

   * Changes for the Russian environment: 1. added utf-8 as an
     alternative coding system to koi8-r; 2. more meaningfull
     sample-text ("Happy wark with Emacs")


   * ccl-encode-koi8-u-font: small spelling correction.

   * Changes for the Ukrainian language environment: 1. added utf-8 as
     an alternative coding system to koi8-u; 2. added sample-text (I
     don't know well Ukrainian, but this seams to be "While the
     Ukrainian speach is alive, the Ukrainian people will be alive


   * No changes.  The generic language environment for alternativnyj
     makes no harm.  On the other hand I suppose that nobody will
     complain if you remove it.

Tajik language environment:

   * No changes.  I can say nothing definitely about Tajik language.
     I am not sure wether tajiks are still using Cyrillic alphabet.  I
     have never seen a font for KOI8-T, moreover I think this encoding
     is not supported by XFree.


This coding system was implemented differently that the others.  A
buffer with symbols from iso-8859-5 character set can not be saved
using CP1251 (although iso-8859-5 is a subset of CP1251).  There was
no support for cp1251 fonts and no mime-charset.  I haven't given much
thought about this, I simply reimplemented this coding system the same
way as the other Cyrillic coding systems are implemented.

During this I renamed this coding system from windows-1251 to cp1251
(the first stays still as an alias).  Reason: the name CP1251 is used
by Glibc and GNU utilities (includingly gettext), this name is what
most people are used to and will expect from Emacs.

This patch doesn't include a generic language environment for CP1251.
I will make such an environment latter since CP1251 is widely used for
several languages.

   * Bulgarian and Belarusian language environments: added utf-8 as an
     alternative coding system.  Added sample-text (the Bulgarian is
     "Happy working with the editor Emacs!", I don't know well
     Belarusian, but this seams to be "Lives during the ages the
     Belarusian speach -- spirit and glory of the people".


I added four UTF-8 language environments: Cyrillic-UTF-8 (a generic
Cyrillic environment for UTF-8), Bashkir, Kazakh and Mongolian.


Anton Zinoviev

Attachment: emacs-cyrillic.diff.gz
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]