Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

From:	Rustom Mody
Subject:	Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date:	Sun, 27 Sep 2015 14:50:48 +0530

On Sun, Sep 27, 2015 at 1:12 PM, David Kastrup <address@hidden> wrote:
>
> Eli Zaretskii <address@hidden> writes:
>
> > I've also looked at the *.po files in the latest releases of GNU Make,
> > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of
> > such files still use non-UTF-8 encodings.
>
> Which, btw, I consider crazy.
>


Ive been trying to understand this stuff and was looking at eg.
lisp/language/indian.el

In there I find that:
(defconst bengali-composable-pattern
  (let ((table
     '(("a" . "\u0981")        ; SIGN CANDRABINDU
       ("A" . "[\u0982-\u0983]")    ; SIGN ANUSVARA .. VISARGA
       ("V" . "[\u0985-\u0994\u09E0-\u09E1]") ; independent vowel
       ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
       ("B" . "[\u09AC\u09AF-\u09B0\u09F0]")        ; BA, YA, RA
       ("R" . "[\u09B0\u09F0]")        ; RA
       ("n" . "\u09BC")        ; NUKTA
       ("v" . "[\u09BE-\u09CC\u09D7\u09E2-\u09E3]") ; vowel sign
       ("H" . "\u09CD")        ; HALANT
       ("T" . "\u09CE")        ; KHANDA TA
       ("N" . "\u200C")        ; ZWNJ
       ("J" . "\u200D")        ; ZWJ
       ("X" . "[\u0980-\u09FF]"))))    ; all coverage
etc etc

And repeated with small variations for devanagari, tamil, telugu etc
It would sure help a native speaker if the comment and the ucs-hex
were interchanged with the actual chars used instead.

So then I checked why the file was showing as UTF-8 encoded.

Found this one non-ASCII line:

(set-language-info-alist
 "Kannada" '((charset unicode)
         (coding-system mule-utf-8)
         (coding-priority mule-utf-8)
         (input-method . "kannada-itrans")
         (sample-text . "Kannada (ಕನ್ನಡ)    ನಮಸ್ಕಾರ")
         (documentation . "\
Kannada language and script is supported in this language
environment."))
 '("Indian"))

It strikes me that this sample text should be there for the other
languages also but it does not seem to be there

Just for context if I can understand whats going on, I would like to
help improve this/these docs:


(info "(elisp)input methods")

  | How to define input methods is not yet documented in this manual,
but here we
  | describe how to use them.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, (continued)

Prev by Date: New file notification event `stopped'
Next by Date: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Previous by thread: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Next by thread: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Index(es):
- Date
- Thread