Re: On Unicode

lout-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On Unicode

From:	Valeriy E. Ushakov
Subject:	Re: On Unicode
Date:	Fri, 19 Mar 1999 14:16:20 +0300

On Fri, Mar 19, 1999 at 01:47:13PM +1100, Darrin wrote:

> I think lout could be adapted to handle this input quite easily. The
> real work would be in the PostScript backend. How does it handle large
> character sets? Is it unicode-based?

No, the real problem is coming up with a good font selection scheme
(FSS).  The problem you refer to is part of this one.

E.g. if you have a text in a mixture of languages and have several
fonts that each have glyphs for some particular language - you would
want to set font once and let formtter care about switching fonts as
necessary.  For example Latin, Cyrillic and IPA share the same
lettering principles, so you can apply one type design to these
scripts (e.g. Times, Helvetica), so you just want to say "I want
Times" and let the system switch between Latin and Cyrillic Times for
you.  Of course if you use one (say WGL) font with all the necessary
glyphs this problem doesn't arise.

However other scripts have different lettering principles - you cannot
meaningfully apply Times face design to Japanese or Arabic.  So you
need a font selection scheme that will allow to specify which concrete
fonts to use for particular sets of glyphs.

> Also Unicode doesn't define character codes for ligatures, which
> would make support of them diffucult.

What?!  Devanagari has 1046 _mandatory_ ligatures registered as
occuring in real texts, but Unicode doesn't assign a single codepoint
to any Devanagari ligature.

Ligature is a purely graphic artefacts.  Glyphs are combined into
ligatures, but characters are the same.  Font is a collection of
glyphs, not characters, and indexing into the font to access necesarry
glyphs could be pretty much arbitrary.  In general (1) encoding of a
sequence of characters (text) and (2) encoding of sequence of glyphs
to display these characters (e.g.  argument to `show') could be pretty
much different.

So a good nagari font will provide the 1046 ligatures mentioned above
and even more, since some of them have several allographs and software
will take care to convert from a sequence of characters into ligature
codes.

In general, a mapping is defined by an (font specific) FSM that takes
character codes on input and produces glyph codes on output.  For
Latin script set in Courier the mapping is 1-1.  Proportional font
might map 2 or 3 character to one glyph (ligature), e.g. fi and ffi.
Asian scripts like Arabic or Devanagari will require significantly
more complex FSMs.

OpenType is an attempt to address this problem by placing the burden
of ligature selection and relative glyph positioning (the FSM) on to
the font designer, so that application can feed character codes
directly and let the FSM(s) provided by the font to perform the
necessary conversions (the main tables are GSUB and GPOS).

One existing example of such an FSS that is probably familiar to many
on this list is JDK's font.properties file.

Example specification:

# Fonts
serif.plain.0=-linotype-times-medium-r-normal--*-%d-*-*-p-*-iso8859-1
serif.1=-morisawa-ryumin light kl-light-r-normal--*-%d-*-*-m-*-jisx0208.1983-0
serif.2=-morisawa-ryumin light kl-light-r-normal--*-%d-*-*-m-*-jisx0201.1976-0
serif.3=-urw-itc zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific
serif.4=--symbol-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific

# FSM's
fontcharset.serif.0=sun.io.CharToByte8859_1 
fontcharset.serif.1=sun.awt.motif.CharToByteX11JIS0208
fontcharset.serif.2=sun.awt.motif.CharToByteX11JIS0201
fontcharset.serif.3=sun.awt.motif.CharToByteX11Dingbats
fontcharset.serif.4=sun.awt.CharToByteSymbol

PS: I've got no replies to my proposal to set up a separate mailing
list for discussion of Unicodification of Lout (most issues related to
that topic are of no interest to the users of Lout).  So let me issue
a second call for participants - if you're interested in hashing this
things out, drop me a personal email.  If there's enough interest (say
5 people) - I'll set up a new list.

Note, that at this point all we need is a thorough discussion, not
hacking away instantly.  Unicode poses quite some interesting problems
and any coding is a waste unless those problems are understood,
discussed and some design is proposed.

Interesting topics are font support (the biggest one), hyphenation,
collation and others.

May be this list will just die quietly like jadelout did.  But it
worth a try.  Hundreds of thousands of TeX users can sit and wait
while some brave guys bite the bullet - and indeed the two guys bit it
and TeX community now. have Omega.  But lout mailing list has only
200+ subscribers, so a 1000 direct users (i.e. people who write Lout
documents, not use Lout as a backend, like in debiandoc) would be a
good guesstimate.  This means that odds that some someone bite the
same bullet for Lout are pretty negligible.

Well, I guess I'm trying to communicate the old "if not I - then who"
and guilt people into participating, so never mind.  ;-)

But if you are interested in the topic - drop me a note anyway.

SY, Uwe
-- 
address@hidden                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen

[Prev in Thread]

Current Thread

[Next in Thread]

On Unicode, Tamas Papp, 1999/03/18
- Re: On Unicode, darrin, 1999/03/18
  - Re: On Unicode, Valeriy E. Ushakov <=
    - Ligatures (was Re: On Unicode), darrin, 1999/03/21
  - Re: On Unicode, Ted Harding, 1999/03/19

Prev by Date: gap-problems in lists
Next by Date: Re: gap-problems in lists
Previous by thread: Re: On Unicode
Next by thread: Ligatures (was Re: On Unicode)
Index(es):
- Date
- Thread