bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #54213] [grodvi] Basic Latin ^ and ~ on input map to surprising Uni


From: G. Branden Robinson
Subject: [bug #54213] [grodvi] Basic Latin ^ and ~ on input map to surprising Unicode code points
Date: Mon, 10 Jan 2022 21:33:41 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Update of bug #54213 (project groff):

                  Status:                    None => Invalid                
             Assigned to:                    None => gbranden               
             Open/Closed:                    Open => Closed                 
                 Summary:  grodvi: broken ^ and ~ => [grodvi] Basic Latin ^
and ~ on input map to surprising Unicode code points

    _______________________________________________________

Follow-up Comment #1:

[comment #0 original submission:]
> grodvi replaces ascii ^ (U+005E) by ˆ (U+02C6) and ~ (U+007E) by ˜
(U+02DC).

Yes.

> Test case:
> 
> $ cat test.man
> .TH test 1
> .BI "perl example: " "$str =~ m/^[a-z]$/;"
> 
> $ man -Tdvi ./test.man | dvipdfmx > test.pdf
> 
> $ pdftotext test.pdf -
> 
> And check that there is correct output: $str =~ m/^[a-z]$/;
> Currently there is ˆ and ˜.
> 
> Werner LEMBERG wrote on mailing list that there are already macros for
textual representation form: \(ha and \(ti. So they should be used for ^ and ~
by default.

\(ha and \(ti are not macros, they are special character escape sequences for
accessing spacing forms of the circumflex accent and tilde, respectively.

^ and ~ do _not_ map to spacing forms, but rather to modifier letters.  This
is due to *roff heritage going back to the early 1970s when Western Electric
Model 37 Teletypes were used as Unix terminals, for document composition among
other purposes.  The ASCII ^ and ~ characters were small and high above the
baseline so that they could be used as accent marks on a base character.

The example should more properly read as follows.


.TH test 1
.BI "perl example: " "$str =\(ti m/\(ha[a-z]$/;"


(I also would not set a literal example in italics in a man page, but that's a
separate issue.)

It is probably a good idea to consult the groff_char(7) man page.  The page
has been heavily revised since groff 1.22.4.  You can get a preview of it at
Michael Kerrisk's Linux man-pages project site. 
https://man7.org/linux/man-pages/man7/groff_char.7.html

The groff_man(7) page in groff 1.22.4 also has some advice in this area.


       \(ha   ASCII circumflex accent.  Use  for  syntax  elements  of
              programming  languages because some output devices might
              replace  unescaped  circumflex  accents  with  non‐ASCII
              glyphs  like  the Unicode U+02C6 modifier letter circum‐
              flex.

       \(ti   ASCII tilde.  Use for  syntax  elements  of  programming
              languages  because some output devices might replace un‐
              escaped tildes with non‐ASCII glyphs  like  the  Unicode
              U+02DC small tilde.


In groff 1.23.0, it is expected that the foregoing material will move to a new
groff_man_style(7) page (and Kerrisk's site already reflects this move), since
it not specific to the man(7) package, but it is hard to get man page writers
to read general *roff documentation.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54213>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]