texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Texmacs-dev] utf-8 support update


From: Felix Breuer
Subject: [Texmacs-dev] utf-8 support update
Date: 23 Nov 2002 21:24:17 +0100

Hello!

I have begun work on the TeXmacs universal character set -> Unicode
mapping. It can be found at my site www.fbreuer.de/texmacs. 

> It may not be really necessary to put the comments (a lot of extra work).

Those comments were from http://www.lut.fi/man/tex/dcfonts/node7.html, I
started working from there.

> You have to be careful with the number of bytes for each character.
> In the Cork encoding, each character only takes one byte,
> so you should write #41 for "A" rather than "#0041".

I changed the mapping accordingly.

> We will rather write these conversion routines in C++ (they must
> be really fast) in src/Resources/Translators

I do not get how this translator works. It seems to never return a
translated string. And instead of building a table associating indices
into the string-to-be-translated, it associates strings with indices.
Are texmacs hashmaps multimaps? I am lost. Could somebody explain it to
me?

Since we are talking about a conversion of the encoding of a string and
not of a translation of its contents, wouldn't it be better to function
as_utf8 to string.cc? This would lend itself more to the inclusion of
other encodings using iconv.h. However, I don't want to argue, I just
need someone to enlighten me :)


Regarding the universal characters: <big|cap> is a different character
then \<cap\>, so <big|...> nodes would have to be converted as well. Why
isn't <big|cap> encoded as <big|\<cap\>>? The latter seems more
consistent to me. How about <left|...>, <right|...>, <mid|...>? 


Felix





reply via email to

[Prev in Thread] Current Thread [Next in Thread]