[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character
From: |
G. Branden Robinson |
Subject: |
Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character |
Date: |
Wed, 26 Jul 2023 10:35:50 -0500 |
Hi Thomas,
At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
> In the bash manual page (`man bash`), the ASCII tilde character '~'
> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
>
> $ man bash | grep 'additional binary operator'
> An additional binary operator, =˜, is available,
>
> The same happens for the use of ~ as a shorthand for the home
> directory. This makes the manual page incorrect, and difficult to
> search.
>
> It looks like there is an ASCII tilde character in the man page's
> source code:
>
> $ gunzip -c /usr/share/man/man1/bash.1.gz | grep 'additional
> binary operator'
> An additional binary operator, \fB=~\fP, is available, with the same
>
> I don't know the first thing about groff, but `man groff_char`
> suggests that ~ is indeed rendered as "modifier tilde", and that one
> should write \(ti to obtain an actual tilde character.
I know a little about groff. Your advice is fine for man pages that
target only groff[1] and/or mandoc[2], but not Heirloom Doctools
troff,[3] neatroff[4] or Plan 9 troff (in its original form or as
maintained in Plan 9 from User Space[5]), and not legacy implementations
descended from AT&T troff that are, as far as I can tell, unmaintained
by the few Unix System V vendors that still exist.[6][7]
Many projects don't need to worry about such extreme portability in
their man pages, but GNU Bash arguably does. (I'm open to correction.)
Furthermore, in the *roff language itself, as originally implemented by
Joe Ossanna (and re-implemented by Brian Kernighan) there is no good
way to test for the existence of a special character.[8]
As a first stab at it, I'd divide the world into two camps: (a) groff
and mandoc(1), and (b) everything else, and not worry about (b).
The bash(1) man page has an extensive preamble already that still
includes a workaround for 4.3BSD(!), so adding a little bit to it to
accommodate systems developed since 1990 might not be too disruptive.
I'm attaching a straw man diff to the bash(1) page. If Chet likes it,
I'm happy to prepare one against the bash devel branch.
bash(1) also attempts to select a font named "CW" in places, which is
another portability problem (it's a Unix System III [and later] troff
font name that was available on _some_ output devices). But I'd like to
see how we get over this bridge before I try to cross that one. :)
> I'm guessing the manpage is generated from texinfo, so if this is
> actually a bug in texinfo, feel free to forward this email to
> bug-texinfo at gnu.org.
I don't think that's actually true. As far as I know, Chet maintains
Bash's Texinfo docs and man pages in parallel by hand.
Regards,
Branden
[1] https://www.gnu.org/software/groff/
[2] https://mandoc.bsd.lv/
[3] https://github.com/n-t-roff/heirloom-doctools
[4] https://github.com/aligrudi/neatroff
[5] https://github.com/9fans/plan9port
[6] HP-UX 11 appears to still ship an AT&T/DWB or System V troff.
Solaris 10 does, but it is nearing end-of-life and Solaris 11
replaced its troff (of similar lineage as HP-UX's) with groff.
[7] It is also not hard to make AT&T-descended troffs support the
`ha` and `ti` special characters. For instance, here's a patch to
Documenter's Workbench (DWB) 3.3 troff's "Latin1" output device.
--- R.orig 2023-07-26 09:55:30.527340674 -0500
+++ R 2023-07-26 09:58:49.658662373 -0500
@@ -68,6 +68,7 @@
bs "
] 33 3 93
^ 33 2 147
+ha "
--- 47 2 94
--- 50 1 95
` 33 2 96
@@ -101,6 +102,7 @@
--- 20 2 124
} 48 3 125
~ 33 2 148
+ti "
--- 54 0 126
\` 33 2 145
ga "
But even after 30+ years since groff emerged on the scene, I'm not
aware of a single such troff having done this.
[8] A clever *roff hacker could try using the output comparison operator
and width computation escape sequence to measure of a candidate
special character, but this would not be reliable. The output
drivers of AT&T device-independent troff appear to format
unrecognized characters as blanks (putting horizontal motions on the
output). (groff does not, throwing an error diagnostic instead.)[9]
But if a special character did exist and happened to be the same
width as such a blank character, this test would produce a false
negative. Worse, on nroff-mode devices, including the terminal
emulators that 99% of all man page reading is done, _all_ glyphs are
the same width, so you'd get false negatives all the time.
[9] This is a groff/AT&T troff difference that I don't think is
documented by groff. Maybe I should fix that.
bash.1.diff
Description: Text Data
signature.asc
Description: PGP signature
- Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Thomas ten Cate, 2023/07/26
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Chet Ramey, 2023/07/26
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character,
G. Branden Robinson <=
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Steffen Nurpmeso, 2023/07/29
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Steffen Nurpmeso, 2023/07/29
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Bjarni Ingi Gislason, 2023/07/30
- Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character, Bjarni Ingi Gislason, 2023/07/30