[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #54213] [grodvi] Basic Latin ^ and ~ on input map to surprising Uni
From: |
G. Branden Robinson |
Subject: |
[bug #54213] [grodvi] Basic Latin ^ and ~ on input map to surprising Unicode code points |
Date: |
Mon, 10 Jan 2022 21:33:41 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 |
Update of bug #54213 (project groff):
Status: None => Invalid
Assigned to: None => gbranden
Open/Closed: Open => Closed
Summary: grodvi: broken ^ and ~ => [grodvi] Basic Latin ^
and ~ on input map to surprising Unicode code points
_______________________________________________________
Follow-up Comment #1:
[comment #0 original submission:]
> grodvi replaces ascii ^ (U+005E) by ˆ (U+02C6) and ~ (U+007E) by ˜
(U+02DC).
Yes.
> Test case:
>
> $ cat test.man
> .TH test 1
> .BI "perl example: " "$str =~ m/^[a-z]$/;"
>
> $ man -Tdvi ./test.man | dvipdfmx > test.pdf
>
> $ pdftotext test.pdf -
>
> And check that there is correct output: $str =~ m/^[a-z]$/;
> Currently there is ˆ and ˜.
>
> Werner LEMBERG wrote on mailing list that there are already macros for
textual representation form: \(ha and \(ti. So they should be used for ^ and ~
by default.
\(ha and \(ti are not macros, they are special character escape sequences for
accessing spacing forms of the circumflex accent and tilde, respectively.
^ and ~ do _not_ map to spacing forms, but rather to modifier letters. This
is due to *roff heritage going back to the early 1970s when Western Electric
Model 37 Teletypes were used as Unix terminals, for document composition among
other purposes. The ASCII ^ and ~ characters were small and high above the
baseline so that they could be used as accent marks on a base character.
The example should more properly read as follows.
.TH test 1
.BI "perl example: " "$str =\(ti m/\(ha[a-z]$/;"
(I also would not set a literal example in italics in a man page, but that's a
separate issue.)
It is probably a good idea to consult the groff_char(7) man page. The page
has been heavily revised since groff 1.22.4. You can get a preview of it at
Michael Kerrisk's Linux man-pages project site.
https://man7.org/linux/man-pages/man7/groff_char.7.html
The groff_man(7) page in groff 1.22.4 also has some advice in this area.
\(ha ASCII circumflex accent. Use for syntax elements of
programming languages because some output devices might
replace unescaped circumflex accents with non‐ASCII
glyphs like the Unicode U+02C6 modifier letter circum‐
flex.
\(ti ASCII tilde. Use for syntax elements of programming
languages because some output devices might replace un‐
escaped tildes with non‐ASCII glyphs like the Unicode
U+02DC small tilde.
In groff 1.23.0, it is expected that the foregoing material will move to a new
groff_man_style(7) page (and Kerrisk's site already reflects this move), since
it not specific to the man(7) package, but it is hard to get man page writers
to read general *roff documentation.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?54213>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #54213] [grodvi] Basic Latin ^ and ~ on input map to surprising Unicode code points,
G. Branden Robinson <=