bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: minor hyphenation issue


From: Dave Kemper
Subject: Re: minor hyphenation issue
Date: Tue, 23 May 2017 18:30:25 -0500

On 5/21/17, Barbara Beeton <address@hidden> wrote:
> not a legal requirement that would hold up in court,
> but a courtesy to knuth, which I suspect would be
> backed by a large segment of the computer science
> community,

I've no interest in trying to unseat tradition.  What I wondered was
whether it's practical to create a superset file that, when processed
to remove non-ASCII lines, generates the historical Knuthian pattern
file.  This allows unchanged historical functionality while not
impeding modern relevancy.

But Karl Berry points to perhaps a better way forward for groff:

On 5/22/17, Karl Berry <address@hidden> wrote:
> 1) Gerard Kuikens created, years ago, a huge set of additional patterns
> for US English. As I recall, they covered all known exceptions at the
> time he made them. They have been available in TeX Live as language
> "usenglishmax" (among other names). As far as I know, he is still
> willing to maintain it, if anyone had bugs/requests. The patterns are
> (nowadays) in TL's file
> texmf-dist/tex/generic/hyph-utf8/patterns/txt/hyph-en-us.pat.txt

Does it make sense for groff to use a pattern list that can be updated
as needed, rather than one frozen by tradition?  Is the one cited
above a good choice?

On my system, texmf-dist/tex/generic/hyph-utf8/patterns/txt/hyph-en-us.pat.txt
contains only ASCII, while many other files in this directory have
UTF-8 characters.  This implies to me that there's no technical
limitation to adding non-ASCII patterns to hyph-en-us.pat.txt -- is
that accurate?

> 2) Although we certainly aren't going to change the default typesetting
> done by "tex" (or "latex" or "pdflatex", or, I suppose, "groff"), I see
> nothing in principle that stops the addition of UTF-8/Latin-N/whatever
> patterns, to be enabled in a given document. The frozenness of Knuth's
> patterns, while certainly true, is not a block to moving forward.

In groff, I think a better design decision is to break all English
words correctly by default rather than requiring an option or request
to enable such behavior.  But it's a hypothetical decision until base
groff knows how to handle words with accented characters at all.

> 3) Besides Liang's thesis, you may be interested in the
> information/links about the current state of TeX hyphenation at
> http://tug.org/tex-hyphen. Also Mojca and Arthur's paper (they are the
> instigators and principal maintainers of hyph-utf8) last year about it:
> http://tug.org/TUGboat/tb37-2/tb116miklavec.pdf
>
> best,
> karl



reply via email to

[Prev in Thread] Current Thread [Next in Thread]