[bug #63334] \[u....] syntax for ASCII characters handled inconsistently

bug-groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #63334] \[u....] syntax for ASCII characters handled inconsistently

From:	Dave
Subject:	[bug #63334] \[u....] syntax for ASCII characters handled inconsistently
Date:	Tue, 8 Nov 2022 04:34:05 -0500 (EST)

URL:
  <https://savannah.gnu.org/bugs/?63334>

                 Summary: \[u....] syntax for ASCII characters handled
inconsistently
                 Project: GNU troff
               Submitter: barx
               Submitted: Tue 08 Nov 2022 03:34:02 AM CST
                Category: Core
                Severity: 2 - Minor
              Item Group: Warning/Suspicious behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Tue 08 Nov 2022 03:34:02 AM CST By: Dave <barx>
I see the same behavior in groff 1.22.4 and in the latest git code.  (And for
that matter, going all the way back to at least 1.19.2.)

ASCII characters represented in \[u....] form are handled inconsistently.  A
simple demonstration of the difference:

$ echo '\[u0021]\[u0022]' | nroff | cat -s
troff: <standard input>:1: warning: can't find special character '\!'
"


\[u0022] is correctly converted to, and output as, a quotation mark.  But
\[u0021], rather than being converted to a "!", is for some reason converted
to the sequence "\!", which (unsurprisingly) is not a recognized character.

It's not clear to me what internal mechanism might cause this: if "\[u0021]"
were parsed as a backslash followed by "[u0021]", the bracketed sequence
wouldn't be specially interpreted at all.

Looking at all the pre-alphabet ASCII symbols:

$ printf "\\[u%04x] " $(seq 32 64) | nroff | cat -s

Five of them are handled as expected, 15 are converted to unrecognized \
characters, and 13 are not recognized at all.  

That last case I don't consider a bug, since (current) groff does not specify
that any of them should be recognized.  (The 1.22.4 groff_char(7) page sort of
gave the impression that some of them would be, but these sequences have been
removed from the drastically rewritten 1.23 groff_char(7).)  Arguably, none of
this is a bug, since no documentation explicitly states that, for example,
"\[u0021]" will be recognized as "!".  But the way it _is_ handled is
surprising enough that I wanted to at least bring it to the development team's
attention.







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63334>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[bug #63334] \[u....] syntax for ASCII characters handled inconsistently, Dave <=

Prev by Date: [bug #58930] take baby steps toward Unicode
Next by Date: [bug #62950] [gropdf] revise 'download' resolution process
Previous by thread: [bug #58930] take baby steps toward Unicode
Next by thread: [bug #51073] PROBLEMS: Correct the answer about "test-groff"
Index(es):
- Date
- Thread