bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #59962] soelim(1) man page uses pic diagram--should it?


From: G. Branden Robinson
Subject: [bug #59962] soelim(1) man page uses pic diagram--should it?
Date: Tue, 4 May 2021 19:10:34 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Follow-up Comment #6, bug #59962 (project groff):

Hi, Helge!

Thanks for following up.  I apologize that this response is lengthy.  You've
stumbled into some controversial topics.

[comment #4 comment #4:]
> 1. The headings of soelim are now all lower case instead of upper case (e.g.
Name, Synopsis, Description instead of NAME, SYNOPSIS, DESCRIPTION)

To further elaborate on Dave's helpful comment, all of groff's man pages have
migrated to mixed-case section and subsection headings.

(And some day I'll get them all migrated to sentence case instead of title
case as well.)

> 2. The markup is partially strange. Usually program names are in B<>, but
now they are in I<>

Please excuse me--I'm not upset with you at all, but you raise an issue that
provokes me into a rant, which I offer for the record and possible contention
of other groff contributors.

groff's use of italics here is a deliberate stylistic choice.  The Graphic
Systems C/A/T for which Unix troff was written circa 1973 supported three font
styles: roman, bold, and italic.  All three have been used over the years to
mark up the topic part of a man page cross reference.  (Blissfully, nearly
everyone agrees that the parenthesized manual section part should be in
roman.)  Early Unix manuals simply set them in roman; by the time of Unix
Seventh Edition (1979), the man pages were consistent about using italics
(_except_ in the "See also" section, which eschewed font changes altogether). 
A look at the hier(7) page of the various historical Unix releases available
at The Unix Historical Society (TUHS)
<https://minnie.tuhs.org/cgi-bin/utree.pl> is illustrative.

This practice was pretty consistent up until the 1990s.  Things have gotten
more chaotic since then, possibly thanks to the advent of Linux and BSD
kernels running on consumer PC hardware--more specifically on terminal drivers
written for the PC console.  By the time the luster wore off of VGA text
modes, people had hacked "ANSI" color support onto various terminal drivers
and emulators and called the results compatible with DEC VT100 terminals (when
they weren't--Thomas Dickey is helpful on this
<https://invisible-island.net/xterm/xterm.faq.html#what_vt220>) and generally
exhibiting a lot of ignorance of ISO 6249/ECMA-48 escape sequences and of the
existence of the terminfo library altogether.

These days you will find a significant number of people claiming that man page
cross references should have the topic part in bold because it's "literal". 
"Anything literal in man page text", they will tell you, "should be in bold,
and anything variable in italics".

While more correct than the opposite, the foregoing rule is overstated and
over-applied.  One of the principles of pleasant typography is that boldface
should not be used to excess, and in man pages a robotic application of the
above-quoted principle does indeed frequently lead to excess.

Worse, many champions of the above inflexible principle merrily boldface their
man page cross references without using the \% hyphenation escape (of which
they have no knowledge, but see below) at the beginning to suppress
hyphenation of the topic word, so you will often end up with a hyphen and a
line break in the middle of this supposedly scrupulously "literal" man page
topic word.  Users who believe what the principle expounders have been
peddling will be frustrated when they attempt to copy and paste such
"literals".

Nevertheless some people remain violently committed to boldface topic names in
man page cross references.  And while I think they're wrong and inconsistent
with Unix historical practice and good typographical style, this isn't really
where the problem lies.

The problem is that the man(7) macro language doesn't know what a man page
cross reference is.  The insufficiency of semantic cues is one of the problems
that motivated the development of the mdoc(7) macro language used by the BSDs.
 Unfortunately, that language is much larger than man(7), which is already
poorly used and poorly understood by many hackers, who are also frequently
resistant to learning much about it as they proclaim their preference for
Markdown, perldoc, or DocBook instead.

Much more importantly from the standpoint of an integrated documentation
standpoint, the fact that man(7) doesn't know what a man page cross reference
is means that it cannot hyperlink between man pages in spite of all the
necessary ingredients being present...almost.

A solution to the debacle exists. Long story medium, with a new macro MR for
man page cross references and a new string, say MF (or maybe XF), which
specifies the font to be used when rendering the topic part of the cross
reference, we can solve all of these problems.  (1) Man page readers, not
authors, can configure whichever font they want to see for the topic in cross
references.  (2) Hyphenation will be suppressed in the topic with no extra
effort or learning.  (3) Semantic cross-reference information will be
available in (3a) the device-independent output format, making life easier for
anything that cares to process it, for instance to generate hyperlinks and
(3b) the source document for the benefit of the many tools that (attempt to)
parse man(7) language directly.

The downside?  It will take many, many years for pages to migrate.  I expect
we'll still be reading unadapted man pages when we're discovering unsigned
32-bit overflows in "enterprise" Linux distributions in 2038.

It's on my to-do list to implement this, and maybe it will be in the groff
1.23.0 release.  You can read more here
<https://lists.gnu.org/archive/html/groff/2020-08/msg00068.html>.  (Note that
my observations in that message about font styling practices have been
modified as above per research I've done in the intervening months.)

> (with an additional \\% at the beginning, but I don't complain about the
latter).

That's the *roff hyphenation character.  At the beginning of a word it
suppresses hyphenation.  This is approximately a 50-year old syntactical
feature.  :)

> In a KDE konsole this looks strange (underlined instead of bold), in a VT
visually this does not matter.

Underlining is to be expected; most terminal emulators do not support italics,
so the TTY output driver for groff tells the terminal to use underlining
instead.  See grotty(1) for more information.

I confess that I go many months without checking man page rendering in VTs. 
But if this is something you do I am intensely interested in any readability
problems you encounter.

> 3. Usually B<> is not additionally quoted, I noticed that it is now in the
new version, e.g. \\[lq]B<...>\\[rq].

Again, this is a stylistic choice.  Bolded content sometimes becomes ambiguous
if the bold attribute is lost or stripped away, which can happen when people
use the man page interface clumsily (or render man pages to an extremely
primitive device).  My rule of man page maintenance/authorship is to introduce
quotes if such confusion is a significant risk, and not otherwise.  Two rules
of thumb are:

(1) If the bolded content contains spaces, quote the content as well; if the
bold is stripped away, the reader won't be able to tell where the bolding
stops.

(2) If the bolded content starts with letters or punctuation that might be
parsed as part of the sentence, quote the content as well.  For example, some
of the ms(7) macros take an optional argument "no" that changes their
behavior.  The construction "if you give this macro a no, this heading is
suppressed" is hard for a human to parse on the first reading.

Counterexamples, where I seldom if ever find supplemental quotation helpful,
are sequences beginning with a backslash (which has no function in natural
languages) or Unix option dashes (which are ambiguous in principle, but almost
never in context, especially if one follows the style rule which holds that em
dashes used to interrupt a sentence should not have any adjacent space).

> 4. Regarding the picture, the toolchain of manpages-l10n still does not
integrate it but tells me it is in a separate file. This is a little
unfortunate, but right now I do not have the time to debug this. And the
downside is only the missing translation. Therefore please close this bug for
now. If I ever investigate this further, I would open (if necessary) a new
bug.

Whew.  It took me a while to get back from all of my soap boxes to the actual
topic. 😬

I am not sure what you mean by the picture being "in a separate file", but I
will assume (unless you correct me) that this is an artifact of the way the
manpages-l10n tool chain works, and not a defect in groff's soelim(1) man page
per se.

> 5. The line graphics was changed. On my system, the arrows are displayed
fine in a VT, but not the lines. Both display fine in a KDE konsole. Maybe you
want to keep the previous line and just use the arrows? (But most users
probably will use a terminal program in a graphical environment, so this
really is very minor)

I will check this out.  I'm a bit loath to change the input because it is
correct with respect to my present understanding.  What I think is happening
is that the VT driver has no glyphs for the Unicode box drawing characters, so
it doesn't render them.

This is a tricky problem for groff because it--specifically, grotty(1), the
output driver for terminal devices--has no way to query the terminal device
regarding its repertoire of supported glyphs or Unicode code points[1].  This
isn't a matter of the necessary code not being written yet (in groff); as far
as I know, there is _no interface_ for such a thing in the Unix or Linux
terminal APIs, nor in the ISO 6429/ECMA-48 control sequence scheme for passing
such information.

My dim recollections of the Linux VT implementation are that (1) it was
written back when ISO 8859 character sets were all that most people bothered
to use (UTF-8 didn't seriously start to take wing on Linux until after 2000,
thanks to folks like Markus Kuhn); (2) they were limited to 256 code points;
and (3) there was some VGA hack that enabled 512 different glyphs on the
screen if you disabled one of the character attributes.  I don't remember
which it was, but memory of this point is one of the factors that makes me
paranoid about text attributes being stripped in rendered man pages.

> So I would suggest to address 1. and 2. (and maybe 3./5.) and then close
this bug.
> 
> Thanks again for your work.

Thank you for yours, and for your patience with my reply!  If you can shed
light on any of these issues, please follow up.

Regards,
Branden 

[1] People familiar with groff may wonder about this claim given the existence
of '.if c', '.fchar', and similar.  As I understand it, these work at the
interface between groff language input and the output driver.  But the problem
we have with VTs not drawing these line characters is that the Linux terminal
driver can't tell grotty that it doesn't support them.  However, the Linux VT
driver _does_ (or did, the last time I checked, months or years ago) support
the glyphs in the DEC ACS (alternate character set), which includes
box-drawing characters.  It therefore seems to me that when the Linux VT
driver is in UTF-8 mode and it hits one of the code points for which a DEC ACS
symbol is entirely adequate, it should render that ACS symbol instead.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59962>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]