groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using the `hw` request in man pages (was: adjustment and hyphenation


From: G. Branden Robinson
Subject: Re: using the `hw` request in man pages (was: adjustment and hyphenation in mdoc(7) pages)
Date: Sun, 26 Mar 2023 01:15:27 -0500

Hi Alex,

At 2023-03-25T20:38:39+0100, Alejandro Colomar wrote:
> On 3/25/23 09:40, G. Branden Robinson wrote:
[man pages using the `hw` *roff request to override hyphenation]
> > This is true, and I do on rare occasions see man pages doing this.
> 
> Maybe it would be a good thing for man.local?  For manual pages, it
> seems a bit too repetitive, IMO.  Maybe for a manual page that uses
> very rare keywords would be a good place to use those.

The man.local file is not an updatable thing by packages that install
man pages.

My first thought was that distributors could support a man.local.d
directory where packages could install files, packages with man
pages would provide short files within that directory containing `hw`
requests, and distributors would change man.local to `so` or `soquiet`
(groff 1.23) each of those files.  We could call these "hyphenation
override files".

But there are still several problems.

1.  `so` and `soquiet` don't perform glob expansion.  So man.local
    would _still_ need to be edited to name every file required by a
    package providing man pages and requiring this feature.

2.  man.local _could_ source a single file, updated by some trigger
    or post-installation script, that lists all the man page-containing
    package hyphenation override files.

3.  These files could override each other.  What if one package wants a
    word hyphenated one way and another package has a different
    preference?  Worse, what if a man page expects groff's hyphenation
    patterns to apply to a word, but some unrelated package has gone and
    stomped all over it?  Deciding who has precedence seems intractable.

4.  Nothing prevents a package from populating the override files with
    things other than `hw` requests.  Not only is this potentially
    nefarious, the fact that every override file would get read for
    every man page rendered means that someone else's botched
    hyphenation override file ruins every man page you try to read.

My net takeaway from this is that it is indeed better to keep
hyphenation overrides within individual man pages.  But maybe the only
way to know how tedious this really is to see how much a large,
practical corpus of man pages, like the Linux man-pages, requires it.

However,

5.  Now that serial processing of man pages is practical (i.e.,
    "groff -man page.1 page.2 page.3 anotherpage.1" and so on), item #3
    above rears its head even without any shenanigans involving
    man.local.  That file could be empty or nonexistent and this would
    still be an issue.  The "good" news is that most people don't bother
    to serially render pages, and it will likely be a while, if ever,
    before man-db man(1) exercises this feature.  Still, the threat
    exists.

One of the themes of my suggested revisions to GNU troff has been to
provide ways to unwind or reset things that historically haven't been
available.  One of those is environment removal (Savannah #60954).

Another that has occurred to me is hyphenation override removal.

Today, invoking the `hw` request without arguments does nothing.  We
could change it to clear any existing hyphenation overrides.

Or, perhaps better, we could add an 'hwrm' or 'rhw' request; if given
arguments, it reads each word (ignoring hyphens), matches it against the
existing list of overrides, and removes the word if found.  If given no
arguments, it removes all overrides.  Then, an.tmac (and doc.tmac) could
call it when hitting `TH` (and `Dd`) macros, tidying up the state of the
formatter for the next document.

> > I'm ambivalent about the use of the `hw` request in man pages.
> > 
> > 1.  I like the clarity of the "never use *roff requests" rule.  My
> >     internal bright-line rule enforcer is enamored of this
> >     principle.  It keeps the fingers of the novice out of the meat
> >     grinder.
[...]
> > 5.  We could hold to principle #1 by adding a man(7) macro, `HW`,
> >     which simply wraps `hw`.
> 
> What's the gain of such a thing?

Adherence to Puritan principle; I'd be able to keep pronouncing from on
high in my ivory tower that one shouldn't invoke *roff requests in man
page documents.

> Translators will have the same problem translating .hw than
> translating .HW.  The only difference is that .HW would appear in the
> man(7) spec, which would force them to recognize it, as opposed to
> saying "we don't support plain roff".

Right.  That's worth something to me, though maybe not enough in this
case to pay its freight.  I have unrelated similar cases in mind.[1]

> .MR will only improve the status quo, so if not many complained till
> now (I did, but 1 is not too many), there will be even less need for
> .hw soon.

Yes.  Cutting down the need for `hw` or `\%` is one of its advantages.

> I'd say, let's defer this problem for long after .MR, and see if there
> are any remaining issues.  Same with \%, which is why I don't yet want
> to introduce it in the Linux man-pages.

Fair enough.  The other side of the coin stamped "PORTABLE" is that
nobody said you have to use all of the features that are.  :)

Regards,
Branden

[1] I find use of `br` also excusable in man pages, when no existing
    man(7) macro will serve.  (You _could_ drop in an `RS`/`RE` pair
    with nothing between, or call `RE` or `EE` "unpaired", but those are
    pretty kludgy.)  Yes, we could introduce a `BK` macro (`BR` is
    already taken), but it just doesn't seem worth the trouble to me.

    The only use I've found in groff's man pages for `br` is immediately
    preceding an `ne` request to manage widows and orphans.  I am hoping
    those won't stay around forever, because (1) we implement `KS` and
    `KE` macros for managing keeps, as discussed earlier on this list,
    and/or (2) we format all paragraphs (and maybe (sub)section
    headings) in a diversion, and then only permit page breaks in
    reasonable locations.[2]  This wouldn't be Knuth-Plass but it would
    be a big help, and I have a sketch in my mind of how to get it done
    with a diversion trap.

[2] Here's the sketch.  Gather (sub)section headings and paragraphs into
    diversions.  Any paragraph of 3 output lines or fewer cannot have a
    page break within it.  Collect the first two output lines of a
    paragraph into diversion (appending to the one used for the heading,
    if any).  Then start a new diversion for further paragraph text.
    Once you've set the fourth output line of a paragraph, measure the
    space available to the bottom of the page (distance to the next page
    location trap \n[.t]).  If this amount is more than the height of
    the (heading and the) first two lines of the paragraph, emit that
    diversion.  Otherwise, break the page, emit both, and stop
    diverting.  This could still leave an orphaned line if a paragraph
    were longer than a page (by one output line at the end), but that
    seems like a rare enough case that it doesn't need to be tackled at
    first.  Maybe someone can see a flaw in this.

    Once worked out for man(7), it could be applied to all of our other
    macro packages.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]