[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Stefan Monnier
Subject: Re: eight-bit char handling in emacs-unicode
Date: 18 Nov 2003 12:12:10 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

>>> The basic problem is that we don't distinguish a character
>>> (code) and a number.  So, we introduce a character object

>> That's one way to look at the problem.
>> Another is to say that the problem is instead that we do not distinguish
>> between arrays of chars and arrays of bytes.

> I agree that it's possible to grasp the problem in that way,
> but I'm not sure which is the better way.  Could you explain
> WHY yours is better?

I'm not sure whether it's better or worse.  The problem I have with the
introduction of a new type for chars is that it is a change that has far
reaching consequences and I'm not sure it would solve all our problems
since many of the problems have to do with bad elisp code.

>> Which of 1 to 3 is the best is not clear, and maybe we can just live with
>> `make-string-unibyte' and `make-string-multibyte'.

> I think you mean string-make-unibyte/multibyte, but, for the
> current problem, we can't use it because string-make-unibyte
> may behave differently in different language environment.
> Such a lang. env. that makes iso-8859-1 or Unicode the
> highest priority for the character `À' is ok.

> (string-make-unibyte (concat '(?a 192))) = "a\300"

> But, if some lang. env. prefers such a charset for `À' that
> encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

No.  My `make-string-unibyte' should only work to convert "bytes in
multibyte string" to "bytes in unibyte string": there's no char, thus no
coding-system.  If the multibyte string argument contains a char that's
not an eight-bit-char, then it's an error.

To do what your string-make-unibyte does you should use
`encode-coding-string' where the coding system is passed explicitly.

I've changed my Emacs so that string-make-unibyte does the above
(i.e. signals an error if it encounters a non-byte char) and it works fairly
well, except for the few places where the elisp code is sloppy and needs to
be fixed.

>> Note that 1-3 are not mutually exclusive so we can use
>> them all.

> Yes, but, at least, I really want to avoid "(3) Make a
> series of new functions".

(defun concat-unibyte (&rest x)
  (make-string-unibyte (apply 'concat x)))

so we don't need this series of new functions, but if some of them are used
often enough, we can add them of course.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]