groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: o with ^ on top of it


From: G. Branden Robinson
Subject: Re: o with ^ on top of it
Date: Thu, 25 Aug 2022 17:43:34 -0500

Hi Alex,

At 2022-08-25T21:08:17+0200, Alejandro Colomar wrote:
> The following code (found in regex.7) wants to represent an 'o' with a
> '^' on top of it (.  Is that code correct?

The exhibit, extracted, is this:

\o'o\(ha'

This means "overstrike the glyphs for 'o' and the 'ha' special
character".  The 'ha' special character is described variously as a
caret, circumflex accent, or "hat", and it corresponds to 0x7E in the
ASCII/ISO 8859/Unicode character sets.

The answer to your question is...

Yes and no.

> It's working on the PDF (although it's ugly), but not on the terminal.
> It was changed by a commit that changed ^ by \(ha for compatibility,
> but I'm not sure if that's correct in this specific case.

That change is not really material to the underlying problem.  It's even
arguably wrong, since the troff semantics of '^' are that it's a
combining character (John Gardner, are you listening?).

Let me go through the surface problem and then I'll get to the deeper,
worse ones.

In many circumstances there won't be a difference between ^ and \(ha
anyway; if the output device doesn't have distinct glyphs for a "full"
or "spacing" tilde and a "combining tilde accent", then there's no
difference here.

Furthermore, the fervent street preachers of the man page "I type ASCII
and by God what I should get is ASCII" religion will hack up man.local
or even patch the formatter to force a "full, spacing" tilde to be
output when "^" occurs in man page sources.

So, when you overstrike these two characters on a device that supports
overstriking, you'll probably get an ugly oversized "^" that intersects
the "o" below it.

That gets us to problem layer two: devices that support overstriking.

Video terminals and their emulators don't.  It's been decades since
paper terminals like the Teletype Model 37 where overstriking was
marvelously simple--just print, backspace, and print, were in common
use.

When you tell a video terminal to do that, you get a destructive
backspace and replacement of the second glyph with the first.

So, in the above case, you'll get '^' instead of (UTF-8) 'ô', which will
please almost no one.

Long story short #1: it's a bad idea to ever use the '\o' escape
sequence, not just in man pages, but in any document destined for a
video terminal (emulator).

The problem gets worse because of the quasi-negotiated portable subset
of man(7) that makes the language smaller, makes formatters and
processors of man(7) documents easier to write, and which keeps Ingo
Schwarze and I from fighting more than we already do.

The problem is worse because every avenue I can think of for
circumventing it is foreclosed by portability considerations.  In
ordinary *roff documents, you might do something like the following.

.ds ^o \o'\(hao'\"
foo "\fI[[=o=]]\fP", "\fI[[=\o\*(^o=]]\fP",

This defines a *roff string called '^o' to manage this troublesome
character.  Note that I've switched the order of the accent and the base
character so that a destructively backspacing terminal will render the
'o' preferentially.  This limps along better for general purpose use
like spelling peoples' names, but it just moves the lump in the carpet
if you're really trying to illustrate the combined character (UTF-8
again) 'ô'.

However, string definitions are not portable man(7).

One might try to use the real deal only on typesetters and abstract the
character away on terminal devices.

.if n foo "\fI[[=o=]]\fP", "\fI[[=<o with circumflex accent>=]]\fP",
.if t foo "\fI[[=o=]]\fP", "\fI[[=\o'o\(ha'=]]\fP",

But there are _two_ things wrong with this. (A) conditional expressions
are even worse for man(7) portability than string definitions (because
you need a more powerful interpreter) and (B) some nroff devices can
render the (UTF-8) 'ô' glyph just fine, and it doesn't help anyone to
throw away that advantage.

But it gets even worse.  Even if we had a great mandoc/*roff portability
summit and admitted enough functionality to get either of the foregoing
solutions into our officially blessed portable man(7) subset, we'd
_still_ have a problem.

And that is that the available glyph repertoire is not known until
formatting time, and depends on the output device.

Not only is the repertoire of special characters device dependent, but
accented letters in particular were kicked away from the concern of the
formatter per se by Kernighan's device-independent rewrite of troff
circa 1980.  In the 1976 version of CSTR #54 you'll find a fascinating
lsit of all available special characters and their renderings by the
Graphic Systems C/A/T phototypesetters.  When it came time to give troff
device independence, people clearly realized that it was utterly up to
the device what glyphs were going to be available.

And it gets worse yet!  Back in 1980 people must have figured that video
terminals would never get large font repertoires, and in the event they
did, they'd become effective typesetter emulators and be able to do
things like constructively overstrike an "o" with, in Unicode parlance,
a "modifier letter circumflex accent".

So, troff people merrily carried on building accented glyphs with tricks
like the one you showed.  And video terminals didn't need that crap
because all they had were ASCII and nobody was going to do serious
formatting work on them anyway.  (This despite the fact that DEC was
already clearly increasing its glyph repertoire by the time they put out
the VT220 (1983)--but by then relations between the Bell Labs CSRC and
DEC had long since soured for reasons I've seen alluded to but never
spelled out.  Even the VT100 (1978) had a handful of non-ASCII
characters corresponding to the "special" repertoire of CSTR #54.[1])

Grrr...I'm just going to put another gigantic rant into a footnote.[2]

The bottom line is that there is no portable solution.

The quilt project had a similar issue in its man page.  Here's what I
proposed to them, as part of a patch set that got merged after 4 years.

commit 34a4f3a5c9de82be774e8a50e22ebfef54ac6f5d
Author:     G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Wed Aug 3 21:24:45 2022 +0200
Commit:     Jean Delvare <jdelvare@suse.de>
CommitDate: Wed Aug 3 21:24:45 2022 +0200

    Man page: render Andreas Gruenbacher's name with a u-umlaut

diff --git a/doc/quilt.1.in b/doc/quilt.1.in
index df21752..6905bb4 100644
--- a/doc/quilt.1.in
+++ b/doc/quilt.1.in
@@ -474,10 +474,11 @@ QUILT_COLORS='diff_hdr=35;44'
 .EE
 .
 .SH AUTHORS
+.fchar \\[:u] ue
 .I Quilt
 started as a series of scripts written by Andrew Morton
 .RI ( patch\\-scripts ).
-Based on Andrew's ideas, Andreas Gruenbacher completely rewrote the
+Based on Andrew's ideas, Andreas Gr\\[:u]nbacher completely rewrote the
 scripts, with the help of several other contributors (see the file
 .I AUTHORS
 in the distribution).

(The doubled backslashes are because they preprocess the page source
with a backslash-eating tool.)

But neither the `fchar` request nor the special character _name_ ':u'
are portable.  Nor will '^o' be.

Why did my patch take 4 years to get merged?  Don't blame Jean Delvare.
I got involved with fixing up quilt's documentation because I ran into a
bug with its "graph" command when working with some patches I wanted to
apply to Bash.

Hacking on quilt's man page prompted some questions about groff, so I
pushed quilt onto the stack to spend the couple of weeks it would take
to learn what I needed about the man(7) language in depth and detail.

One thing led to another.

I'll probably never get back to Bash.

Regards,
Branden

[1] https://en.wikipedia.org/wiki/VT100
    https://en.wikipedia.org/wiki/VT220
    https://en.wikipedia.org/wiki/DEC_Special_Graphics

[2] On top of that, part of the reason everything around terminal
    handling on Unix, and Linux, sucks is (I surmise) because the Bell
    Labs CSRC leapfrogged from noisy paper terminals to the Blit device
    which they were seemingly convinced was the future.[3]  They backed
    that horse with every dollar available, neglecting "glass TTYs" as
    hard as they could.

    And the C suite at AT&T promptly proved, with the Blit/DMD 5620 just
    as with the 3B20 and the AT&T UNIX PC, that freed by divestiture and
    given a license to print money in the computer business, thoroughly
    incapable doing so.

    I don't blame the Bell Labs CSRC engineers for this state of affairs
    (though there might be some to assign); I'd be surprised if it
    weren't the case that corporate management promised that if the CSRC
    ate its own dog food with the blit that everyone else in America
    would be doing it too, and your beautiful and wildly successful Unix
    creation will be running in every home and business.

    And if you don't play along, your budget will be slashed to ribbons
    because we're a serious computer business now and your department
    has to be a profit center.

    Oh, the joke was on everyone.  The Blit wasn't the future but
    something that looked a lot like it was, and the center of command
    line life moved from a paper terminal to emulator for a glass TTY
    manufactured by DEC running in a window system from MIT.

    As far as I know, the last person to seriously try to resolve the
    idiocy surrounding Unix terminal handling was Dennis Ritchie, with
    "streams".[4]  The Unix System Group, the commercial Unix guys
    behind System III and so on, got hold of it and turned it into
    "STREAMS", of which Ritchie himself was not a fan.[5]  On top of
    that, every BSD and ARPAnet weenie on the planet swore up and down
    that Berkeley sockets were totally better in every way.  (Well, to
    be fair, license-wise, they were.  I don't feel equipped to judge
    their comparative technical merits.  W. Richard Stevens and Douglas
    Comer, however, are.)  I confess I'm a little surprised that no
    Linux kernel hacker has yet proven arrogant enough to believe that
    they can succeed where even Ritchie failed.  Personally, I'll bet on
    Lennart Poeterring gobbling it into systemd and making it 2% faster,
    thoroughly incomprehensible, and utterly unmaintainable by anyone
    except IBM/Red Hat staff.  I mean, that's the whole purpose of
    systemd anyway.)

[3] https://en.wikipedia.org/wiki/Blit_(computer_terminal)
[4] 
https://cseweb.ucsd.edu/classes/fa01/cse221/papers/ritchie-stream-io-belllabs84.pdf
[5] "[Streams] means something different when shouted."

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]