[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
eight-bit char handling in emacs-unicode |
Date: |
Fri, 14 Nov 2003 09:47:51 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <address@hidden>, Simon Josefsson <address@hidden> writes:
> rfc2104.el now works, thanks. But does the fix really have to
> explicitly mention charsets like iso-latin-1? Is there no way to
> handle binary octet strings in emacs-unicode? Preferably in a
> portable way, that works on old Emacs versions and on XEmacs.
>> This is a typical problem of emacs-unicode in which
>> characters 128..255 are valid Unicode characters, thus, for
>> instance, (concat '(?a ?\300)) returns a multibyte string of
>> `a' and `À'. But in the current Emacs, it returns a unibyte
>> string.
>>
>> I suspect the similar fix is necessary in several other
>> places.
> Having a way to deal with data that is a pure single byte, without
> involving coding systems, seems like a rather important thing to me.
I agree with you. Currently, I can think of these methods:
(1) Perhaps the easiest way.
Check `default-enable-multibyte-characters' or a newly
instroduced variable `byte-as-byte' to decide whether a
integer 128..255 must be treated as a Latin-1 char or a
byte. So,
(concat '(?a ?\300)) => "aÀ" (multibyte string)
(let ((byte-as-byte t))
(concat '(?a ?\300))) => "a\300" (unibyte string)
(2) Introduce a new function `eight-bit-char'.
It converts an argument to ascii or eight-bit-char.
(eight-bit-char ?a) => 94
(eight-bit-char ?\300) => 4194240
Then,
(concat '(?a (eight-bit-char ?\300))) => "a\300"
(3) Make a series of new functions (I think it's not good)
concat vs concat-unibyte
string vs string-unibyte
aset vs aset-unibyte
(4) Most drastic way (the cleanest but requires lots of work)
The basic problem is that we don't distinguish a character
(code) and a number. So, we introduce a character object
(like XEmacs). The function `character' converts a
character code into the corresponding character object. The
lisp reader always generate a character object for ?a,
?\300, etc. So:
(concat '(?a ?\300)) => "aÀ"
(concat '(?a #o300)) => "a\300"
(concat '(?a (character #o300))) => "aÀ"
(concat '(?a #o300 (character #o300))) => "a\300À"
Note: (character X) == (decode-char 'ucs X)
> It started now, but when I enter a summary buffer it crashed:
> Program received signal SIGSEGV, Segmentation fault.
> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
> 1591 char_ranges[n_char_ranges++] = c;
> (gdb) bt
> #0 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
I just tried gnus but I couldn't reproduce it. So, I need
more help. Could you show me the results of the following?
(gdb) p n_char_ranges
(gbd) p c
(gdb) p string
(gdb) xstring
(gdb) p *$
---
Ken'ichi HANDA
address@hidden
- Re: BIG5-HKSCS?, (continued)
Re: BIG5-HKSCS?, Simon Josefsson, 2003/11/13
- Re: BIG5-HKSCS?, Kenichi Handa, 2003/11/13
- Re: BIG5-HKSCS?, Simon Josefsson, 2003/11/13
- Re: BIG5-HKSCS?, Kenichi Handa, 2003/11/13
- Re: BIG5-HKSCS?, Oliver Scholz, 2003/11/13
- Re: BIG5-HKSCS?, Kenichi Handa, 2003/11/13
- Re: BIG5-HKSCS?, Oliver Scholz, 2003/11/14
Re: BIG5-HKSCS?, Simon Josefsson, 2003/11/13
eight-bit char handling in emacs-unicode,
Kenichi Handa <=
Re: eight-bit char handling in emacs-unicode, Oliver Scholz, 2003/11/14
Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/14
Re: eight-bit char handling in emacs-unicode, Oliver Scholz, 2003/11/15
Re: eight-bit char handling in emacs-unicode, Simon Josefsson, 2003/11/15
Re: eight-bit char handling in emacs-unicode, Simon Josefsson, 2003/11/14
Re: eight-bit char handling in emacs-unicode, Alex Schroeder, 2003/11/16
Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/17
Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18