bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #58930] take baby steps toward Unicode


From: G. Branden Robinson
Subject: [bug #58930] take baby steps toward Unicode
Date: Sun, 23 Oct 2022 15:16:31 -0400 (EDT)

Update of bug #58930 (project groff):

                  Status:               Need Info => In Progress            

    _______________________________________________________

Follow-up Comment #24:


[comment #23 comment #23:]
> A few nits about specific definitions:
> 
> [comment #21 comment #21:]
> > +.fchar \[u2000] \h'1n' \" en quad
> > +.fchar \[u2001] \h'1m' \" em quad
> > +.fchar \[u2002] \h'1n' \" en space
> > +.fchar \[u2003] \h'1m' \" em space
> 
> As the "quad" and "space" forms are canonically equivalent (see
http://www.unicode.org/mail-arch/unicode-ml/y2003-m04/0316.html), might it be
better DRYwise to define one in terms of the other?

Indeed so; thanks for the reference.  Since we otherwise don't use the term
"quad" (despite its impressive typographical pedigree), I think I'll alias the
quads to the spaces.  Not like this should make much, if any, practical
difference.

> 
> > +.fchar \[u2016] || \" double vertical line (matrix norm)
> 
> This one presents a kerning issue: if two U+2016s are set next to each
other, they should have a little space between them.

You've studied kerning a lot more than I have.  I'm not sure that kerns are
even applied to fallback character definitions.  Or is that what you're
proposing to work around?

> (In typeset output I've found \| sufficient to make ersatz U+2016s defined
as two U+007Cs not look like an unbroken row of bars.  In terminal (or any
monospace) output, if the font has no U+2016, a full space becomes the only
way to distinguish two U+2016s defined as above from four U+007Cs.)

I'm not sure we can achieve unambiguous typography on typesetters, let alone
terminals.  More precisely, I am not sure that one can reliably infer input
characters from rendered glyphs in the general case.  For one thing, fonts
aren't under our control at all.  Secondly, the Unicode glyph confusability
issue <https://unicode.org/reports/tr36/> suggests that we couldn't overcome
this problem even if we did have such control.

I am proposing to punt on this issue and see if the users complain,
basically.
 
> > +.fchar \[u2018] ` \" left single quotation mark
> > +.fchar \[u2019] ' \" right single quotation mark
> 
> Defining these as \[oq] and \[cq] seems more semantically meaningful (and
less prone to failure, e.g. if the user has ".tr"ed ` to something else).

Quite right.  I'll do this.

> > +.fchar \[u201C] \[lq] \" left single quotation mark
> > +.fchar \[u201D] \[rq] \" right single quotation mark
> 
> These definitions are fine, but the comments say "single" where they mean
"double."

Whoops!  Thanks again.  Will fix.

> > +.fchar \[u2025] .\|. \" two dot leader
> > +.fchar \[u2026] .\|.\|. \" horizontal ellipsis
> 
> With the internal space between the dots, consecutive versions of these will
also typeset poorly.

I'm tempted to punt on this one, too.  Possibly no serious font for
typesetting even needs to encode these characters, and in groff, if you want a
well-typeset leader, a fundamental formatter feature (Control+A) will give you
one of whatever length you like.

> > +.fchar \[u203D] \z?! \" interrobang
> 
> Per bug #62983, this one might be sequestered behind an ".if t".

Another fair point.

This exercise has made me wonder if we could use a warning category (or some
mechanism) to inform the user when fallbacks are used.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58930>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]