[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: index sorting in texi2any in C issue with spaces
From: |
Patrice Dumas |
Subject: |
Re: index sorting in texi2any in C issue with spaces |
Date: |
Wed, 31 Jan 2024 23:11:02 +0100 |
On Wed, Jan 31, 2024 at 08:10:56PM +0000, Gavin Smith wrote:
> On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote:
> > Hello,
> >
> > I implemented index sorting in C with XS interface in texi2any.
> > When unicode collation is wanted, based on my understanding of
> > Eli suggestions, a collation locale is set to "en_US.utf-8", by
> > newlocale (LC_COLLATE_MASK, "en_US.utf-8", 0)
> > and then strxfrm_l is used (which should be the same as using
> > strcoll_l). With conversion in C/with XS set with environment variable
> > TEXINFO_XS_CONVERT=1 and for now only for HTML, if TEST customization
> > variable is not set.
>
> It seems like a pretty obscure interface. It is barely
> documented - newlocale is in the Linux Man Pages but not the
> glibc manual, and strxfrm_l was only in the Posix standard
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html).
> I don't know of any other way of accessing the collation functionality.
>
> Do you know how portable it is?
I guess not that much, but it seems to exist on MacOS and Windows in
some way according to internet searches. If it does not work the same
on those platforms we could accept patches.
> The documentation for the corresponding
> Gnulib module says the following:
>
> Portability problems not fixed by Gnulib:
>
> This function is missing on many platforms: FreeBSD 6.0, NetBSD 5.0,
> OpenBSD 6.0, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, Solaris 11.3,
> Cygwin 1.7.x, mingw, MSVC 14, Android 4.4.
>
> <https://www.gnu.org/software/gnulib/manual/html_node/strxfrm_005fl.html>
>
> Could it be possible to have an option of "current locale" collation
> which could use more standard interfaces?
That could be possible, it would be with strxfrm.
> Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> There may be language-specific "tailoring" for other languages (e.g.
> Swedish) that the user may wish to use instead. Hence, it may be
> a good idea to allow use of a user-specified locale for collation through
> the C code.
That would not be difficult to implement as a customization variable.
What about COLLATION_LANGUAGE?
> I expect it would require creating a glibc locale to change the collation
> order, which is not something we can do.
I agree. I suppose that the best we can do is allow for the Perl
collation to be used and document the differences.
--
Pat