groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: man(7), hyphen, and minus


From: Russ Allbery
Subject: Re: man(7), hyphen, and minus
Date: Sat, 24 Dec 2022 14:43:44 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

"G. Branden Robinson" <g.branden.robinson@gmail.com> writes:
> At 2022-12-23T12:49:15-0800, Russ Allbery wrote:

>> I've been curious: how much use do you see of groff outside of man
>> pages?

> Others have answered this but I would also point you to Ralph Corderoy's
> page on the subject.

> https://www.troff.org/pubs.html

> It hasn't been updated since about 2006, I think, which means it has
> missed a few publications since then, like _The Go Programming Language_
> and Kernighan's _UNIX: A History and Memoir_.

Thanks!  Happy to see the continuing usage!

I probably should have assumed.  One of the things that I've noticed over
and over about free software is that nothing new ever truly replaces
something old in a comprehensive sense.  I can think of very few programs
that truly no one is using any more, because once the source code is
available to keep them alive, someone will keep them alive.  It makes for
a rather interesting diversity of software (and other things; for
instance, I still use Usenet).

> The groff_man(7) page has long attempted to prescribe a reasonably
> portable, reduced subset of the roff language for use in man pages.
> mandoc maintainer Ingo Schwarze and I spent some time prior to groff
> 1.22.4's release hammering that out in further detail.

Oh, so I was going to mention: currently, Pod::Man rolls its own macros
for verbatim text:

.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..

This looks basically equivalent to .EX/.EE, so I thought about using those
macros (and defining my own if they're not available, at least until no
one is using older implementations that don't have them).  But the main
thing that .EX doesn't support that the long-standing Pod::Man behavior
does is the .ne invocation, which is used like this:

    # Get a count of the number of lines before the first blank line, which
    # we'll pass to .Vb as its parameter.  This tells *roff to keep that many
    # lines together.  We don't want to tell *roff to keep huge blocks
    # together.
    my @lines = split (m{ \n }xms, $text);
    my $unbroken = 0;
    for my $line (@lines) {
        last if $line =~ m{ \A \s* \z }xms;
        $unbroken++;
    }
    if ($unbroken > 12) {
        $unbroken = 10;
    }

This logic is very long-standing and was designed for troff printing of a
manual page (and older nroff setups that still did pagination) to avoid
unnecessary page breaks in the middle of a verbatim block.  I'm not sure
how much this matters given how people use man pages these days, but I
hate to break it for no reason.  So I think I'd need to add an .ne line
after (before?) the .EE macro if I switched to it?

> It's called Pod::_Man_: why would people use it for anything that isn't
> a manual page?

Okay, fair.  :)  Although historically people sometimes did, and of course
once upon a time people would sometimes typeset the full manual for
something with troff.  That output probably isn't as nice as it used to,
since I have subsequently dropped a lot of the attempted magic that only
applied to troff output (replacing paired " quotes with `` '', adding
small caps to long strings of all capital letters, and things like that)
because they were all using scary regexes and occasionally broke things
and mangled things in weird ways, causing lots of maintenance issues.

> Yes.  But there are two problems to solve: (1) acceptance of Unicode
> (probably just UTF-8) input

I was pleasantly surprised at how well this just worked with the man-db
setup on a Debian system, although I think that may involve a fair amount
of preprocessing.

> It has been possible for many years (since well before groff 1.22.3) to
> specify any Unicode code point for output.

Just to provide additional detail for the record (and this is almost
certainly the sort of thing you mean by "acceptance of Unicode input")
here's the simple document I was using for some testing.

https://raw.githubusercontent.com/rra/podlators/main/t/data/man/encoding.utf8

% groff -man -Tpdf -k encoding.utf8 > encoding.pdf
troff: encoding.utf8:72: warning: can't find special character 'u0308'
troff: encoding.utf8:74: warning: can't find special character 'u1F600'

u1F600 is presumably a problem with the output font, but u0308 is a
combining accent mark that groff does definitely support, just not as a
separate character.  (Without preconv, one instead gets mojibake, as I
expected.)

My theory was that combining accent marks pose a bit of an interesting
issue for groff because groff probably shouldn't think of them as a
separate output character that can be mapped in an output font, but
instead needs to essentially transform them into something like
\[u0069_0308] during the input processing.  (This may therefore
essentially be a preconv bug as opposed to a troff bug, and maybe nroff
gets away with it because it can just copy combining accent marks to the
output device and let xterm take care of rendering.)

It all makes sense when viewed through the lens of the *roff language, but
of course in the Unicode world one expects to be able to just produce a
stream of code points and have everything cope.

> Heirloom Doctools is a descendant of AT&T troff; among other things, it
> provides its own man(7) implementation, a lineal descendant of Doug
> McIlroy's 1979 original.  It _can_ and _does_ render man pages.  Whether
> any *nix distribution ("platform"?) ships Heirloom as its sole or
> preferred *roff, I don't know.  I wouldn't be surprised if at least one
> BSD does, for the usual reasons of GPL antipathy[2].  About 15 years ago
> it undertook a major effort to clone groff features, and it is
> reasonably groff compatible when configured to be (`-mg` flag, `xflag`
> request, and whatnot).

Thanks for the background!

> [1] https://www.gnu.org/software/groff/groff-mission-statement.html

This is great.

I am sad that currently Pod::Man is one of the impediments to good
rendering of manual pages in other formats, since I make use of more of
the *roff language (mostly to work around bugs) than those tools often
understand.  So I have an incentive to want to simplify the output as much
as I can, consistent with remaining portable.

> [2] The CDDL is way _more_ free than the GNU GPL, you see, because it is
>     a copyleft _and_ has a choice-of-law clause, and someday the BSDs
>     will have an island microstate nullifying all copyleft licenses.

Don't look at me, I release everything under an MIT license.  :)

-- 
Russ Allbery (eagle@eyrie.org)             <https://www.eyrie.org/~eagle/>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]