[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: address@hidden: BUG: Emacs ignores charcell width when running on te
From: |
Kenichi Handa |
Subject: |
Re: address@hidden: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)] |
Date: |
Tue, 24 Oct 2006 09:30:38 +0900 |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) |
In article <address@hidden>, Richard Stallman <address@hidden> writes:
> Would you please look at this issue and comment?
> I am not sure if this is something we should try to fix, now or ever.
> But I would like you to think about it.
Sorry for the late response. Actually there's not that
much we can do on this matter.
> ------- Start of forwarded message -------
> Date: Wed, 11 Oct 2006 15:16:50 -0400
> To: address@hidden
> From: Rich Felker <address@hidden>
> Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs
> & ideas for fix)
[...]
> When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text
> files, it treats all characters as if they occupy one character cell
> column on the terminal. This causes it to become confused about the
> cursor position whenever there is CJK fullwidth text or scripts that
> use nonspacing combining characters present, to the point that editing
> is impossible.
Unfortunately, the current Emacs assumes that all characters
in a charset has the same width. As far as we are dealing
with legacy charsets (e.g. ISO8859, JISX, KSC, GB), that
assumption worked well.
> Attached to this email is a UTF-8 file you can open in Emacs which
> exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and
> Thai (for nonspacing).
> The root of the problem: In term.c, produce_glyphs() function, the
> code assumes all multibyte characters for a given 'charset' have the
> same width:
The root of the problem is that there's no way for Emacs to
know how many column a terminal use to display a specific
character. For Hiragana, it's possible for Emacs to guess
it will be displayed with two-column, but for Tibetan and
Thai, it heavily depends on terminal's capapbility of
handling CTL (Complex Text Layout). If a terminal doesn't
know how to do CTL for Tibetan, it will just produce glyphs
for each syllable component without stacking (and thus
occupy several columns). If a terminal does, it will dislay
them in one (or two) column. But, there's no way for Emacs
to know which is the case.
> Correctly fixing the issue:
> 1. Needs some sort of width lookup for unicode characters without
> having to convert from Emacs' native encoding to UCS thru UTF-8.
> This should be straightforward for someone who understands the
> code.
That only works for such simple characters as Hiranaga. In
emacs-unicode-2 branch, I introduced char-width-table that
maps each character to column-width occupied by that
character on screen.
> 2. The apppend_glyph() function needs to handle width==0 case, perhaps
> converting the previous glyph into a COMPOSITE_GLYPH instead of
> adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH
> system in Emacs so I don't know if this is feasible.
COMPOSITE_GLYPH is a glyph containing multiple characters
that must be displayed as a single grapheme cluster. On X,
Emacs displays characters in a COMPOSITE_GLYPH correctly
(sometimes by stacking, sometimes by overstriking, sometimes
by using alternate glyph, etc). But, as there's no way on
terminal to perform such a operation, current Emacs just
displays the first character of a COMPOSITE_GLYPH.
> At present this issue is making it very difficult for me to use
> Tibetan text in composing email and material for the web, so I'm
> looking for some way to fix it, either upstream or with hacks I can
> make locally for the time being until it's fixed properly.
If you want to handle Tibetan text, using X is the only way
for the moment.
---
Kenichi Handa
address@hidden