lout-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Another way of tweaking lout to support alien characters?


From: Valeriy E. Ushakov
Subject: Re: Another way of tweaking lout to support alien characters?
Date: Thu, 3 Dec 1998 05:24:38 +0300

On Wed, Dec 02, 1998 at 10:34:02PM -0000, Ted Harding wrote:

> My impression was that Matej wanted to escape from dependence on
> special printer fonts, and since PostScript can do it with standard
> fonts by making a composite glyph he would like know how to do it
> within Lout.
> 
> I've looked at what you wrote about UA and AC functions without being
> clear whether it corresponds to what I'm talking about. To refer again
> to gtroff (sorry about that, but it is what I know, very well, how to use)
> you can define character translation. For instance:
[...]
> So in gtroff you can type single characters and have them translated to
> compound definitions (whose primary names are "\(??" forms).
> 
> So I guess the question for Lout is: is there a corresponding
> straightforward translation mechanism? I think it should be so, since from
> my understanding of the "expert" document you could define anything
> (including a single input character) to stand for any defined object.
> So <z-caron>[8859-2] = 0xBE = 190 = <three/four>[8859-1] which is
> @Char threequarters in Lout. So you can define "@Char threequarters"
> to give rise to the composite glyph for <z-caron>. The main question
> which is not clear to me is whether Lout, on seeing the byte 0xBE
> in the input, will recognise this byte as corresponding to the Lout
> evocation "@Char threequarters" for this purpose. If this approach would
> work, then I don't think the "UA"/"AC" mechanism would be needed for this.
> Am I correct here, or not understanding something?
> 
> Or can you simply define "Z" to mean the Lout definition of the composite
> object (with compounded metrics, of course), where "Z" is the single
> character with code 0xBE?

First, thanks for sharing your gtroff experience.  It's always nice to
have on board someone who knows how some feature works in a cousin
product.  This helps to avoid repeating old mistakes, to copy good
design, or to base new design on an old one.  So no need to apologize.
(OTOH, I heard that in British English a sentence is considered
incomplete if it doesn't contain "please" or "sorry" or, preferably,
both - and I wholeheartedly agree with this practice.)

Yes, you could define (almost) anything to be a symbol.  So you can
ask Lout to correct your favorite typos, like:

    def hte { the }

but this definition, quite correctly, won't affect a word "lighter",
because lexer treats "lighter" as a single token - just like in other
programming languages.

So while you can define "\300" to synthesize an R-acute, Lout will
only recognize it when it's standalone.  Also, as soon as you define
"\300" to be a composite object - it's not a letter anymore and will
not participate in, e.g., hyphenation.

Another problem is that Lout is still a little bit latin1-centric.
When Lout reads definitions, it's idea of a (syntactic) letters is
that of Latin1.  So, apart from English alphabet, at-sign and
underscore, the following characters from the upper half of the table
are considered letters: "\300" to "\377" except "\327" (multiply)
and "\367" (divide).

Now consider z-caron ("\274") which is not a letter according to
Lout's idea about character categorization. So while you can defined a
symbol with a name "\274", you can't define a symbol with z-caron in
it's name unless all other chars in the name of that symbol are
non-letters as well.  Or, similarly, if I want to define Lout symbol
with name in Russian I can't use cyrillic small letter ve ("\327") and
cyrillic capital letter ve ("\367").

This situation cannot be resolved (within the current framework)
because Lout parser doesn't know anything about charsets, it only sees
bytes (octets).  If "\327" and "\367" are added to the list of
letters, this will allow people that use russian (koi8-r or 8859-5) or
greek (8859-7) to use these characters in alphabetical symbol names,
because in these charsets these codes are assigned to alphabetical
characters.  However in LatinN charsets these two codes are assigned
to multiply and divide, that are punctuation characters, and users of
these charsets will not be able to define symbols with these
punctuation names because Lout's lexer will happily merge these
"letters" with the immediately adjacent letters.  At least, this is my
understanding from the quick glance through the code, Jeff will
correct me if I'm wrong.

Well, but that was a digression.

I think that it's good for zcaron to be zcaron all the way untill it's
time to print it.  These way it will participate in hyphenation,
small-caps transformation, sorting etc just like any other letter do.

Simple words, that are not recognized as defined symbols are finally
mapped into objects that consist of concatenation of glyphs for the
given characters (mapping from characters to glyphs is defined by LCM
files).  One word can go to several places (e.g. section title goes to
TOC and running header) and in each of this places it can be typeset
in different font.  Some of that fonts may have a glyph for zcaron and
some may have not.  If the glyph is there, it's certainly better to
use it.

If it's not - we could use UA/AC to synthesize it.  This will work
like explicit definition (that you suggest) but this definition will
be synthesized as necessary and applied at the very last moment.

Let's digress a bit one more time.  A curious feature of Lout is that
language (@Language) is a part of style and thus, it's inherited
through the point of appearance.  This means that a word (a sequence
of bytes) is interpreted as characters depending on the dynamic
context.  Thus bytes are not treated as characters that have inherent
identity.  Lout will happily typeset "\300\301\302" in whatever
current font is.  OTOH, @Char { glyphname } will check that there's
such glyph in the current font.

Ideally if bytes was really characters with identity, we would know
that "\300\301\302" are character codes for some known characters from
some know charset and that these characters are mapped into known
glyphs.  Then, when the word is to be printed we would map
"\300\301\302" into

    @address@hidden@Char{glyphX3}

where glyphX1 is a glyph for the character that is coded at codepoint
0300 in the certain charset (not the glyph encoded at 0300 in the
current font!) - and that would be an error if the font don't have
these glyphs.  But these requires that we know what charset to use to
interpret the character code 300.

But I can't reason clearly about this any more at 5am, so I'd better
stop here.


Jeff, will you be so kind as to comment on this?

SY, Uwe
-- 
address@hidden                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen


reply via email to

[Prev in Thread] Current Thread [Next in Thread]