[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCHes] Add basic multibyte charset handling to makeinfo
From: |
Miloslav Trmac |
Subject: |
Re: [PATCHes] Add basic multibyte charset handling to makeinfo |
Date: |
Tue, 05 Dec 2006 12:51:29 +0100 |
User-agent: |
Thunderbird 1.5.0.8 (X11/20061107) |
Eli Zaretskii napsal(a):
>> Date: Mon, 4 Dec 2006 16:18:53 -0600
>> From: address@hidden (Karl Berry)
>> Cc: address@hidden
>>
>> The attached patches add support for multibyte character sets (e.g.
>> UTF-8) and multi-column characters (e.g. Chinese) to makeinfo.
> Not that I don't think this is great and don't thank Miloslav; I do.
> But can we please first discuss the problem with using the locale's
> encoding instead of @documentencoding? Surely, we can solve that,
> can't we?
Not really. AFAIK
- character set names are not portable across operating systems
- even if you know that "iso-8859-1" is an acceptable character set
name, that doesn't mean a locale using that character set exists.
$current_locale.iso-8859-1 most likely doesn't exist.
So, if we want @documentencoding, we can't use system locales, and we
need a replacement that does at minimum the equivalents of mbtowc () and
wcwidth (). It is completely unreasonable to implement this directly
inside texinfo sources, and I don't think it is really practical to make
texinfo dependent on some other library that provides this functionality
(ICU, maybe?).
The standalone info reader ignores the "Local Variables: coding: ..."
trailer anyway, so the assumption that info files use the system's
character set is already present, although makeinfo doesn't currently
use it.
The UNIX world basically assumes a single system-wide character set (a
single character set must be used for the names in the filesystem, at
least); while technically possible, adding character set indication to
every text file format and character set conversion to every program
using the file format is not practical: it is too much work, it adds
confusing failure modes and it breaks the traditional text manipulation
tool usage.
Thus I prefer a model in which all info files installed on the system
use a common character set, which is the same as the character set the
system is using for other purposes (UTF-8 is the obvious candidate). If
the .texi files in released tarballs don't use this character set,
converting them would be the distributor's task.
Mirek
- [PATCHes] Add basic multibyte charset handling to makeinfo, Miloslav Trmac, 2006/12/04
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Karl Berry, 2006/12/04
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Eli Zaretskii, 2006/12/04
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Eli Zaretskii, 2006/12/05
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Karl Berry, 2006/12/06
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Eli Zaretskii, 2006/12/06
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Miloslav Trmac, 2006/12/08
- Re: [PATCHes] Add basic multibyte charset handling to makeinfo, Miloslav Trmac, 2006/12/08