[bug #59397] Assign default .hcode values to alphabetic characters in gr

bug-groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #59397] Assign default .hcode values to alphabetic characters in gr

From:	Dave
Subject:	[bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set
Date:	Mon, 2 Nov 2020 04:00:00 -0500 (EST)
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0

URL:
  <https://savannah.gnu.org/bugs/?59397>

                 Summary: Assign default .hcode values to alphabetic
characters in groff's default character set
                 Project: GNU troff
            Submitted by: barx
            Submitted on: Mon 02 Nov 2020 02:59:58 AM CST
                Category: Core
                Severity: 1 - Wish
              Item Group: New feature
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

This is copied wholesale from two comments in bug #42870, both of which I
think I wrote, that are only tangentially related to the topic of that bug. 
This is really a separate issue deserving of its own report.

== The problem ==

Groff's default input character set, Latin-1, does not align with its default
hyphenation codes, which are assigned only to ASCII alphabetic characters.  By
default groff should assign hyphenation codes to all alphabetic characters in
the Latin-1 character set, to reflect the default input character set.

== Analysis ==

init_charset_table() in src/roff/troff/input.cpp appears to be what defines
the default hcode values, in particular the lines:


  for (int i = 0; i < 256; i++) {
...
    if (csalpha(i))
      charset_table[i]->set_hyphenation_code(cmlower(i));
  }


So the csalpha() call must be returning false for any characters that are
ISO-8859-1 (a.k.a. Latin-1) alphabetic characters but outside the ASCII
range.

Indeed, a peek into cset_init::cset_init() in src/libs/libgroff/cset.cpp
supports this:


  for (int i = 0; i <= UCHAR_MAX; i++) {
    csalpha.v[i] = ISASCII(i) && isalpha(i);
...
  }


The isalpha() call is part of the C standard library's <ctype.h>.  Its return
value depends on the current locale.  In groff, which operates in the
ISO-8859-1 locale, it's undesirable for this function's behavior to change
based on the user's environment; it's for this reason, I presume, that the
additional test ISASCII() is imposed, to force non-ASCII characters to return
0 regardless of what isalpha() returns.  And in the ASCII range, isalpha()
should function the same no matter the current locale.

But a more robust solution may be to call <ctype.h>'s isalpha_l() instead, so
that the ISO-8859-1 locale can be enforced.  By doing this and removing the
ISASCII() test (from the csalpha.v[i] line and all the following lines setting
other attributes), the character attributes set in cset_init::cset_init()
would be accurate for all ISO-8859-1 characters, not just ASCII ones.

This could have implications beyond the hcode values, of course, and I confess
I'm not familiar enough with groff's internals to determine what they might
be.




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59397>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, Dave <=
- [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, Dave, 2020/11/05
  - [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, Bjarni Ingi Gislason, 2020/11/06
    - [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, G. Branden Robinson, 2020/11/07
    - [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, Dave, 2020/11/08
    - [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set, G. Branden Robinson, 2020/11/28

Prev by Date: [bug #57516] Memory leaks shown by "LDFLAGS=-fsanitize=address"
Next by Date: [bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but should accept any characters entities.
Previous by thread: [bug #57516] Memory leaks shown by "LDFLAGS=-fsanitize=address"
Next by thread: [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set
Index(es):
- Date
- Thread