bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #44018] [PATCH] gtroff hangs in environment::possibly_break_line wi


From: G. Branden Robinson
Subject: [bug #44018] [PATCH] gtroff hangs in environment::possibly_break_line with -mja
Date: Sat, 25 Sep 2021 05:08:50 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Follow-up Comment #4, bug #44018 (project groff):

[comment #3 comment #3:]
> Re comment #2:
> 
> I sense a possible contradiction in this comment: "returning zero" is not
the same as "negative character widths".

Indeed not, and I was not trying to imply that it did.

Perhaps it would be more helpful for me to share a piece of stashed git change
I have, particularly since I need to update this ticket anyway, with the
finding that a negative value from `wcwidth()` _is_ returned via
`font::get_width()`, sometimes, but only if the later is called through the
output driver--I can't get troff itself to provoke this behavior, even by
requesting the offending glyph's width explicitly with the `\w` escape.

The problem crops up if you add an `assert(w > 0)` and then try to build
`pic.html`.  The instrumentation might look a little silly--it's to keep the
terminal from getting blitzed: `font::get_width()` gets called _a lot_.


diff --git a/src/libs/libgroff/font.cpp b/src/libs/libgroff/font.cpp
index 9f578478e..204d86dae 100644
--- a/src/libs/libgroff/font.cpp
+++ b/src/libs/libgroff/font.cpp
@@ -377,11 +377,22 @@ int font::get_width(glyph *g, int point_size)
     return w;
   }
   if (is_unicode) {
-    // Unicode font
+    // The output device uses Unicode encoding.
     int width = 24; // XXX: Add a request to override this.
     int w = wcwidth(get_code(g));
-    if (w > 1)
-      width *= w;
+    assert(w != 0);
+    // XXX: POSIX says that wcwidth() should return -1 if its argument
+    // "does not correspond to a printable wide-character code", but
+    // glibc ~2.28 returns it for printable codes that Unicode Annex #11
+    // calls "ambiguous" in the property "East Asian width".  UAX #11
+    // says that the width is then context-dependent; only our caller
+    // knows this context.  In the future we may need to stop taking the
+    // absolute value and expect the caller to handle a negative width.
+    if (get_code(g) == 8220)
+      debug("GBR: getting width of glyph 8220\n");
+    if (w < 0)
+      debug("GBR: width of glyph %1 is %2\n", get_code(g), w);
+    width *= abs(w);
     if (real_size == unitwidth || font::use_unscaled_charwidths)
       return width;
     else

 
> Regarding the first sentence, for wcwidth(3), a zero return value means the
character takes up no horizontal space.  That is the expected return value for
*many* characters for a variety of reasons, for example zero-width spaces and
combining accents.  So i expect it would be possible to construct a line of
total length zero by including several character, but all of them of wcwidth
zero.

That's not what seems to happen in practice for groff; combining characters
have either been filtered out or have glyph definitions that assign them
positive width.  This makes sense to me--output drivers in general _aren't_
character cell terminals, and it's helpful to know the width of a combining
glyph so that you can center it with respect to the glyph it's combining with,
for instance.

> Regarding the second sentence, for wcwidth(3), a return value of -1 means
that the character is not printable.

glibc 2.28 seems to have its own ideas.  :-/

> In that case, adding that -1 to the line length would indeed be a bug. 
Possible courses of actions are erroring out, inserting a replacement
character, or just skipping the character, whatever seems most helpful in the
case at hand.

Bear in mind here that the value returned by `wcwidth()` is used as a
_multiplier_.  The branch we're in in `font::get_width()` says `if
(is_unicode)`, which restricts us to the grotty and grohtml drivers.

(I know you have a strong distaste for grohtml--can we take that as read?)
 
> Since wcwidth(3) is usually provided by the operating system in the C
library, portable code must make sure that it can deal with any return value
for any character.  Subtle differences exist between the wcwidth(3) return
values of some characters between Linux and BSD and even between different BSD
systems.  In some commercial systems, most notably even the latest releases of
Oracle Solaris, wcwidth(3) is quite badly broken and returns clearly wrong
values for many characters, including -1 or 0 for many characters that should
be width +1 or +2.  On such systems, portable software cannot reasonably avoid
minor output glitches, but it can implement careful and correct handling of
return values to reduce the risk of letting wrong return values go unnoticed,
and even more so to avoid hangs and crashes.

Agreed.  I'm trying to get `font::get_width()` into a place where it is coded
more defensively.

Fixing the problem with `pic.html` generation _might_ be as simple as adding a
glyph definition to devhtml/font/R.proto for it--both `lq` and `rq` are
missing.

There will of course be other glyphs for which glibc's `wcwidth()` returns -1,
perhaps wrongly.  I don't know yet if it will be better to just patch around
that with glyph definitions on the driver side, or do something else.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?44018>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]