bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: `texindex` output depends on locale settings


From: Eli Zaretskii
Subject: Re: `texindex` output depends on locale settings
Date: Sun, 06 Nov 2022 20:55:12 +0200

> From: arnold@skeeve.com
> Date: Sun, 06 Nov 2022 11:25:27 -0700
> Cc: bug-texinfo@gnu.org, arnold@skeeve.com
> 
> Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > > and similarly capable C libraries
> >
> > Are there such libraries in existence, when locale data is considered?
> > Which ones?
> 
> macOS, and Solaris, to name two. I think AIX as well.

That's not what I know.  I think glibc is quite unique.

> Obviously texindex, and gawk underneath it, can't do more than what
> the underlying C library and installed locales enable.  But on systems
> where they can (which is not just GLIBC), it should be possible to do
> more than they currently do now.

You'll just trade one set of bug reports for another, that's all.
There's no way to ensure on Posix systems that an arbitrary locale is
installed.  (Ironically, Windows is in much better shape here.)  So
the problems will remain, and their manifestations will be as hard to
understand as now, they will just be different, and will probably
involve quite a bit of mojibake.

If we want to solve this properly, we need to decode the text into the
internal UTF-8 encoding, process it in UTF-8, and then encode it back
when writing the index.  Which probably means we either should add
such capabilities to Gawk, or do it with some tool other than Gawk.  I
think the latter is more practical, unfortunately, since Gawk doesn't
really have i18n capabilities, it can only use a single locale, and
that locale must be externally installed.  Since texi2any just went
through the same process, I think Perl is probably a good candidate to
replace Gawk as an implementation language for texindex.  Another
possibility is Python.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]