bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented


From: G. Branden Robinson
Subject: [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented
Date: Wed, 13 Apr 2022 23:36:23 -0400 (EDT)

Follow-up Comment #7, bug #58962 (project groff):

Hi, Dave!

[comment #6 comment #6:]
> [comment #5 comment #5:]
> > I believe I've cracked this.
> 
> Great news!
> 
> > $ xxd EXPERIMENTS/dave-58962.roff
> 
> I've also attached this file so anyone else who wants to run it doesn't have
to reconstruct it from hex.

I didn't because I fear encoding mangling.
 
> > It seems like these cases just weren't ever dealt with in
> > the formatter's input parser.
> 
> When I run unpatched groff (even as far back as groff 1.19.2), I get the
second line of stderr output, but not the first.  So 0xAD _was_ being handled
somewhere, though seemingly not in token::next().

Try unpatched troff with the -R flag to shut off the loading of troffrc.

This is why I said "the formatter's input parser".

The tmac files for the various input encodings remap the soft hyphen with a
`tr` request.  I plan to take that stuff out.
 
> And while an input 0xA0 didn't match anything according to the
output-comparison operator, it did get interpreted as a fixed-width
nonbreaking space ("\ " to groff) in the output, so it too was being handled,
just not in the way one might have hoped (or, more the the point, not in the
way that was documented).

I _think_ what's happening here is that it's being handled not exactly as a
space, but as an undefined glyph for which no information is available.  I
haven't chased it down, but I'm guessing that when that happens you get an
unbreakable space of 'spacewidth' size in the current font.

> That is, while the patch as written solves the reported problems, since it
does so only by adding new logic, it would also seem to leave in place some
now-dead code where these two characters were previously handled.  Whether
this is of any practical concern, I leave to you to determine.

Right now, I think the extra handling of SHY by the input encoding macro files
is superfluous and should go.

For unencoded characters, there's probably not much to do beyond what is
already done.  Warn, advance the drawing the position, carry on, and _don't_
match (in the output comparison operator) something that _is_ defined.


$ troff -v
GNU troff (groff) version 1.22.4
$ troff -R ./EXPERIMENTS/dave-58962.roff 
troff: ./EXPERIMENTS/dave-58962.roff:1: warning: can't find character with
input code 160
troff: ./EXPERIMENTS/dave-58962.roff:2: warning: can't find character with
input code 173



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58962>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]