groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issue in man page ascii.7


From: G. Branden Robinson
Subject: Re: Issue in man page ascii.7
Date: Mon, 5 Dec 2022 08:06:26 -0600

Hi Alex,

At 2022-12-05T13:35:42+0100, Alejandro Colomar wrote:
> On 12/5/22 09:15, G. Branden Robinson wrote:
> > [[The fix]] would be something like this:
> > 
> > -3: # 3 C S c s     3: !  +  5  ?  I  S  ]  g   q   {\n"
> > +3: # 3 C S c s     3: !\&  +  5  ?\&  I  S  ]  g   q   {\n"
> > -6: & 6 F V f v     6: $  .  8  B  L  V  \\`  j   t   \\(ti\n"
> > +6: & 6 F V f v     6: $  .\&  8  B  L  V  \\`  j   t   \\(ti\n"
> 
> Thanks!

You're welcome, but I think we might have talked past each other below.

> Sure, I try to do it consistently.  If I Cc you is a "just read it if
> you want, not forced, maybe you're busy and someone else on groff@
> picks it up".  :)

Works for me.  :)

> > what's going on here
[the problem that Helge reported]
> > is actually a GNU tbl(1) bug.
> > 
> > https://savannah.gnu.org/bugs/?61909
> I think I'll keep this as a WONTFIX.
> 
> The man-pages don't have stable releases (i.e., what you get at the
> time your distro releases is what you'll get forever), so stable users
> will have this bug unfixed forever until they dist-upgrade, even if I
> fixed it.
> 
> Soon (we hope), groff 1.23.0 will be released, so next OS releases
> (e.g., Bookworm) won't have this bug (and many others that you fixed).
> 
> So, the only problem is for those who use stable distros, but somehow
> install the fresh man-pages.

No, that is not the case.  Because there _aren't_ dummy characters \&
after the sentence ending punctuators [!?.] that are followed by
multiple space characters in the ascii(7) page today, _and_ every known
released version of GNU tbl incorrectly applies the configured
inter-sentence space to the second space character after such
punctuators, people are getting incorrect output _now_ from this table,
and any others that regex-match "[.!?]  " in ordinary text blocks if
their configured inter-sentence space amount is not the default.

That last condition is in fact common for non-Anglophone users of groff.

Let me show you a simple exhibit and then I'll drown you with more
background.

---snip---
$ cat EXPERIMENTS/iss.man
.TH foo 1 2022-12-05 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
.TS
L.
Foo.  Bar.
.TE
.ss 12 0
.TS
L.
Baz.  Qux.
.TE
.TS
L.
Hep.\&  Sid.
.TE
$ nroff -t -man EXPERIMENTS/iss.man # groff 1.22.4 (Debian)

foo(1)                      General Commands Manual                     foo(1)



Name
       foo - frobnicate a bar

Description
       Foo.  Bar.
       Baz. Qux.
       Hep.  Sid.



groff test suite                  2022‐12‐05                            foo(1)
$ ./build/test-groff -t -man -Tascii EXPERIMENTS/iss.man # groff Git
foo(1)                      General Commands Manual                     foo(1)

Name
       foo - frobnicate a bar

Description
       Foo.  Bar.
       Baz.  Qux.
       Hep.  Sid.

groff test suite                  2022-12-05                            foo(1)
---snip---

So, a table entry _lacking_ these dummy character escape sequences \& is
exposed to the old groff bug, which still exists in the wild on every
system until last week, I suppose.  (This bug is not man(7)-specific.
It will affect any groff document regardless of macro package.)

Lengthy background
==================

It can be seen that the difference in output was prompted by this line.

.ss 12 0

The formatter's default is equivalent to this.

.ss 12 12

The function of the number "12" is not obvious here; it arises from
traditions of mechanical typography.  But what it _means_ is, "put one
word space between each word and put one (additional) word space between
sentences on the same output line".

Yeah, but nobody should be manipulating the inter-sentence spacing in a
man page, right?  Right.  But, localization files...

$ git grep 'ss 12 0' tmac
tmac/cs.tmac:.ss 12 0
tmac/de.tmac:.ss 12 0
tmac/fr.tmac:.ss 12 0
tmac/groff_man.7.man.in:\&.ss 12 0 \e" See groff(@MAN7EXT@).
tmac/it.tmac:.ss 12 0
tmac/sv.tmac:.ss 12 0

Not to mention the fact that this request could appear in a troffrc or
man.local file.  In short, this is a user-configurable parameter and a
portable man page should not assume the inter-sentence spacing amount.

\& works to hide the bug even on old (well, current :-/ ) GNU tbl
because it suppresses the detection of sentence endings altogether.

\& does have other semantics in tbl(1) tables; it is used to align
the units place in columns using a numeric format (classifier "N" rather
than "L" or "C", for instance), but I've never in my life seen that
format used in a man page.  (It is also hard to grep for without gagging
on false positives.)  But, in principle, telling people just to work
around the bug by adding \& in _all_ circumstances is a bad idea for
this reason.[1]

There's a lot of bloody history around inter-sentence spacing, enough
that we have to cover the subject in the groff Texinfo manual,[2] and it
is compounded by luminaries like the general editor of the Chicago
Manual of Style lying to the public about that history.  groff maintains
compatibility with AT&T troff in this area.

In Europe, supplemental inter-sentence space is _not_ common, and I
gather there is some kind of official European Union style guide that
militates against it.  It is binding only upon official EU publications,
but many organizations have adopted it nonetheless--it saves the expense
of maintaining a style guide of one's own, and plenty of people
in the U.K. who voted for and celebrate BrExit nevertheless slavishly
follow EU prescriptions in this area.

> That can be random people that install random packages from source, or
> contributors to the pages.  For both of them, I specify the
> dependencies in the INSTALL file, so I hope they don't blame me too
> much; they should ask their distributor about backporting groff 1.23.0
> for installing the pages from source, or install groff from source, or
> be happy with small glitches like this :)

I understand if you don't want to mess with a belt-and-suspenders
approach, but I want to make sure you're making an informed decision. :)

> However, things like .MR concern me more.

Me too.  I'm trying to contain my expectations because history is
replete with nice new features that suffered deaths of neglect.

(warning: inside baseball^W^Wgroff internals)

Right now even email and web URLs in man pages aren't hyperlinked in
PDF, and that's silly.  So I'm trying to orthogonalize man(7) hyperlink
support so I can couple it to gropdf(1)'s "pdfmark" support.

Or I would be working on it, if the under-documented "pdfhref" macro
weren't structured to make it a pain in this ass.  I guess whoever
designed that didn't expect someone to format link text in a diversion.
Also I discovered an exciting new (old) bug when formatting HTML.  :(

Anyway, once that is done, I can integrate Deri James's cool trick for
converting "local" man page cross references into PDF bookmarks, so you
do something like, hypothetically,[3] produce a 380-page compilation of
60 man(7) and mdoc(7) documents that have hyperlinked cross-references
to each other, and present "man:blah(1)" hyperlinks for pages outside
that collection.

I might fail at orthogonalizing, but I'll do my damnedest to at least
get this _working_.  ("groff 1.24: the same but with elegance"... :-| )

> I'd be happy doing some radical changes and requiring 1.23.0 as a bare
> minimum, and use MR right after the Bookworm release.

[insert Kang and Kodos clip]

> Hopefully that triggers backporting of groff; maybe you can do that as
> a future maintainer of the Debian package?  :P

Maybe, if groff 1.23 proves not to have many surprising regressions,
that would be feasible, but I would prefer to delegate that sort of
task.  Build a team wherever you can.  A backport is more likely to
happen if groff 1.23 proves not to have many regressions from 1.22.4.
I've gone to considerable lengths to avoid that: I have automated test
#152 in my working copy now.  (groff 1.22.4 had three.)

> > [1] (groff insider stuff)
> 
> The parentheses in here help a lot with long messages :)

I fear "tl;dr" was coined around 1999 by people exposed to my emails.

Regards,
Branden

[1] tbl uses the _leftmost_ `\&` in a numerically formatted entry as the
    alignment position.  For instance, imagine a business that produced
    formatted reports by accepting text input from a terminal^Wweb
    form.  Also assume that the report generator wasn't too fastidious
    about tidying up that input.

.\" nroff -t | cat -s
.TS
tab(@);
C S
C S
L N.
Amy's Kennels
Boarded Animals, Week of 2022-12-05
Size@Name and check-in weight (kg)
Large@Max      25.6
\^@Sassy.      44.8
Small@Henrietta    6.24
\^@T. J. Peepers.\&  (chinchilla) 3.03
.TE

This is not a _well_-designed table, but it is a _plausible_ one.  Well,
almost.[4]  But adding another \& later at the "real" position where the
decimal point should be aligned will not help, because the leftmost one
controls.

[2] 
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/groff.texi?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n4976
[3] 
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/doc.am?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n257
[4] I'd like to meet the web-form-using kennel service staffer who
    knew to sneak *roff escape sequences into the input.  But we all
    know that failure to validate input is as common as street litter.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]