emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TUTORIAL.bg and windows-1251


From: Kenichi Handa
Subject: Re: TUTORIAL.bg and windows-1251
Date: Tue, 25 Nov 2003 08:55:52 +0900 (JST)

Sorry for the late responses on this thread.  I'm now
involved in threads more than what my capacity allows.

In article <address@hidden>, Ognyan Kulev <address@hidden> writes:

> Kenichi Handa wrote:
>>  I think the default handling of cyrillic characters must be
>>  most convenient for native users.  But, there are many
>>  languages that use cyrillic and their requests may conflict.
>>  So I think we must start from adjusting each language
>>  environment.  Once we found most language environments
>>  require the same setting, we can make it the default.

> Can X encoding be adjusted?  Isn't there only two choices for cyrillic: 
> iso10646-1 and iso8859-5?

It seems that bg_BG locale of glibc, gtk, or XFree86 (I
don't know which is responsible for) encodes cyrillic
characters using extended segment with charset name
"microsoft-cp1251" in selection.

Please try the attached file.  It overrides the ctext
encoder/decoder so that microsoft-cp1251 is used on decoding
in Bulgarian lang. env.

[...]
> The negative site of Debian packages is that each encoding of the four 
> above mentioned has its own package.  So people sometimes install only 
> microsoft-cp1251 and iso10646-1 fonts, without koi8-r and iso8859-5 ones.

> Another problem with cronyx-courier is that it doesn't work when it's 
> set in Default in Basic Faces customize group.  I've just posted 
> question to comp.emacs.

> What about the following: when mule-unicode-0100-24ff is used and the 
> used iso10646-1 font doesn't contain wanted character (e.g. cyrillic 
> one), then another font is searched that contains such character.  I 
> think this will often end up in cronyx-courier.  Is this hard to be 
> implemented?

I've implemented it in emacs-unicode verion.  But, that
change requires various infrastructure of emacs-unicode, so
it's very difficult to back port it in HEAD.

Anyway, the attached ctext.el also contains a short code to
enable Emacs to display characters in windows-1251 by
microsoft-cp1251 font.  Please try to call
(use-microsoft-cp1251-font).

---
Ken'ichi HANDA
address@hidden

--- ctext.el ---
(defvar ctext-non-standard-encodings-database
  '(("big5-0" big5 2 (chinese-big5-1 chinese-big5-2)))
  "Alist of non-standard character set encodings for CTEXT's extended segments.
Each element has the form (ENCODING-NAME CODING-SYSTEM N-OCTET CHARSET)
and provides information about how to use \"extended segments\"
with the encoding name ENCODING-NAME.

CODING-SYSTEM is the coding-system to encode the characters into
an extended segment.

N-OCTET is the number of octets (bytes) that encodes a character
in the segment.  It can be 0 (meaning the number of octets per
character is variable), 1, 2, 3, or 4.

CHARSET is a charater set containing characters that are encoded
as ENCODING-NAME.  It may be a list of character sets.  It may
also be a char-table, in which case characters that have non-nil
value in the char-table are the target.

On decoding CTEXT, all encoding names listed here are recognized.

On encoding CTEXT, encoding names in the variable
`ctext-non-standard-encodings-list' and in
`ctext-non-standard-encodings' property of the current language
environment are used.")

(defun ctext-post-read-conversion (len)
  "Decode LEN characters encoded as Compound Text with Extended Segments."
  (save-match-data
    (save-restriction
      (let ((case-fold-search nil)
            (in-workbuf (string= (buffer-name) " *code-converting-work*"))
            last-coding-system-used
            pos bytes)
        (or in-workbuf
            (narrow-to-region (point) (+ (point) len)))
        (decode-coding-region (point-min) (point-max) 'ctext)
        (if in-workbuf
            (set-buffer-multibyte t))
        (while (re-search-forward ctext-non-standard-encodings-regexp
                                  nil 'move)
          (setq pos (match-beginning 0))
          (if (match-beginning 1)
              ;; ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
              (let* ((M (char-after (+ pos 4)))
                     (L (char-after (+ pos 5)))
                     (encoding (match-string 2))
                     (encoding-info (assoc-ignore-case 
                                     encoding
                                     ctext-non-standard-encodings-database))
                     (coding (if encoding-info
                                 (nth 1 encoding-info)
                               (setq encoding (intern (downcase encoding)))
                               (and (coding-system-p encoding)
                                    encoding))))
                (setq bytes (- (+ (* (- M 128) 128) (- L 128))
                               (- (point) (+ pos 6))))
                (when coding
                  (delete-region pos (point))
                  (forward-char bytes)
                  (decode-coding-region (- (point) bytes) (point) coding)))
            ;; ESC % G --UTF-8-BYTES-- ESC % @
            (setq bytes (- (point) pos))
            (decode-coding-region (- (point) bytes) (point) 'utf-8))))
      (goto-char (point-min))
      (- (point-max) (point)))))

(defvar ctext-non-standard-encodings-list
  '("big5-0")
  "List of non-standard character set encoding names used in CTEXT.")

(defun ctext-non-standard-encodings-table ()
  (let ((table (make-char-table 'translation-table)))
    (dolist (encoding (reverse
                       (append
                        (get-language-info current-language-environment
                                           'ctext-non-standard-encodings)
                        ctext-non-standard-encodings-list)))
      (let* ((slot (assoc encoding ctext-non-standard-encodings-database))
             (charset (nth 3 slot)))
        (if charset
            (cond ((charsetp charset)
                   (aset table (make-char charset) slot))
                  ((listp charset)
                   (dolist (elt charset)
                     (aset table (make-char elt) slot)))
                  ((char-table-p charset)
                   (map-char-table #'(lambda (k v) 
                                   (if (and v (> k 128)) (aset table k slot)))
                                   charset))))))
    table))

(defun ctext-pre-write-conversion (from to)
  "Encode characters between FROM and TO as Compound Text w/Extended Segments.

If FROM is a string, or if the current buffer is not the one set up for us
by encode-coding-string, generate a new temp buffer, insert the
text, and convert it in the temporary buffer.  Otherwise, convert in-place."
  (save-match-data
    ;; Setup a working buffer if necessary.
    (cond ((stringp from)
           (let ((buf (current-buffer)))
             (set-buffer (generate-new-buffer " *temp"))
             (set-buffer-multibyte (multibyte-string-p from))
             (insert from)))
          ((not (string= (buffer-name) " *code-converting-work*"))
           (let ((buf (current-buffer))
                 (multibyte enable-multibyte-characters))
             (set-buffer (generate-new-buffer " *temp"))
             (set-buffer-multibyte multibyte)
             (insert-buffer-substring buf from to))))

    ;; Now we can encode the whole buffer.
    (let ((encoding-table (ctext-non-standard-encodings-table))
          last-coding-system-used
          last-pos last-encoding-info
          pos encoding-info end-pos)
      (goto-char (setq last-pos (point-min)))
      (setq end-pos (point-marker))
      (while (re-search-forward "[^\000-\177]+" nil t)
        (setq last-pos (match-beginning 0)
              last-encoding-info (aref encoding-table (char-after last-pos)))
        (set-marker end-pos (match-end 0))
        (goto-char (1+ last-pos))
        (catch 'tag
          (while t
            (setq encoding-info
                  (if (< (point) end-pos)
                      (aref encoding-table (following-char))))
            (unless (eq last-encoding-info encoding-info)
              (if last-encoding-info
                  (let ((encoding-name (car last-encoding-info))
                        (coding-system (nth 1 last-encoding-info))
                        (noctets (nth 2 last-encoding-info))
                        len)
                    (encode-coding-region last-pos (point) coding-system)
                    (setq len (+ (length encoding-name) 1
                                 (- (point) last-pos)))
                    (save-excursion
                      (goto-char last-pos)
                      (insert (string-to-multibyte 
                               (format "\e%%/%d%c%c%s"
                                       noctets
                                       (+ (/ len 128) 128)
                                       (+ (% len 128) 128)
                                       encoding-name)))))
                (encode-coding-region last-pos (point) 'ctext-no-compositions))
              (setq last-pos (point)
                    last-encoding-info encoding-info))
            (if (< (point) end-pos)
                (forward-char 1)
              (throw 'tag nil))))
        (if (< last-pos (point))
            (encode-coding-region last-pos (point) 'ctext-no-compositions)))
      (set-marker end-pos nil)
      (goto-char (point-min))))
  ;; Must return nil, as build_annotations_2 expects that.
  nil)

;; The followings are to override the current settings.

(set-language-info "Bulgarian" 'ctext-non-standard-encodings
                   '("microsoft-cp1251"))

(let ((elt `("microsoft-cp1251" windows-1251 1
             ,(get 'encode-windows-1251 'translation-table)))
      (slot (assoc "microsoft-cp1251" ctext-non-standard-encodings-database)))
  (if slot
      (setcdr slot (cdr elt))
    (push elt ctext-non-standard-encodings-database)))

(define-ccl-program ccl-encode-windows-1251-font
  '(0
    ((r1 <<= 7)
     (r1 += r2)
     (translate-character encode-windows-1251 r0 r1)
     )))

(let ((slot (assoc "microsoft-cp1251" font-ccl-encoder-alist)))
  (if slot
      (setcdr slot ccl-encode-windows-1251-font)
    (push '("microsoft-cp1251" . ccl-encode-windows-1251-font)
          font-ccl-encoder-alist)))

(defun use-microsoft-cp1251-font ()
  (let ((fontspec '(nil . "microsoft-cp1251")))
    (map-char-table
     #'(lambda (k v) 
         (if (and v (> k 128))
             (set-fontset-font "fontset-default" k fontspec)))
     (get 'encode-windows-1251 'translation-table))))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]