[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: index sorting in texi2any in C issue with spaces
From: |
pertusus |
Subject: |
Re: index sorting in texi2any in C issue with spaces |
Date: |
Sun, 4 Feb 2024 12:25:49 +0100 |
On Sun, Feb 04, 2024 at 12:55:36PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 4 Feb 2024 11:42:52 +0100
> > From: pertusus@free.fr
> > Cc: Gavin Smith <gavinsmith0123@gmail.com>, bug-texinfo@gnu.org
> >
> > On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > > I think en_US.utf-8 is (or at least can be by default) a combination
> > > of @documentlanguage and @documentencoding.
> >
> > I try to make the index collation as independent as possible of
> > @documentencoding and output encoding. Here the utf-8 is meant to
> > provide a sorting 'independent' of the encoding.
>
> Why is that a good idea? Presumably, a manual whose language is
> provided by @documentlanguage is indeed written in that language, and
> so the collation should be according to that language? Or what am I
> missing?
My point above is not about documentlanguage, it is about
@documentencoding. Regarding @documentlanguage, I agree that it could
be an interesting option.
> If we want collation which uses only codepoints, disregarding any
> collation weights defined by the Unicode TR10, we could use
> en_US.utf-8, but then, as Gavin says, using glibc collation function
> you get more than you asked, because weights are not ignored. So we
> need to use something else in the C variant of collation code, AFAIU.
Indeed, but I have no idea what to use for now.
> > Regarding the language for now the aim was to have something as
> > similar as the Perl output, which is obtained without a locale. The
> > choice of en_US was motivated by that aim. I looked at the
> > /usr/lib/locale/*/LC_COLLATE files on my debian GNU/Linux and there was
> > no "en.utf-8", which would have been my first choice, so I used
> > "en_US.utf-8".
>
> I don't know enough about what Perl does in the module you are using.
It does Unicode TR10, and we pass an option such that Weighting is set
to Non-ignorable.
> "Obtained without a locale" means what exactly? a collation order that
> only considers the Unicode codepoints of the characters?
I mean, in that context, a collation which follows Unicode TR10 with, if
possible, Weighting set to Non-ignorable, without language tailoring.
> Or does it
> mean something else? If it only considers the codepoints, then
> collation in C using glibc functions will NOT produce the same order
> even under en_US.utf-8, AFAIU.
--
Pat
- Re: index sorting in texi2any in C issue with spaces, Eli Zaretskii, 2024/02/01
- Re: index sorting in texi2any in C issue with spaces, Eli Zaretskii, 2024/02/01
- Re: index sorting in texi2any in C issue with spaces, Patrice Dumas, 2024/02/01
- Re: index sorting in texi2any in C issue with spaces, Gavin Smith, 2024/02/01
- Re: index sorting in texi2any in C issue with spaces, Eli Zaretskii, 2024/02/02
- Re: index sorting in texi2any in C issue with spaces, pertusus, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Eli Zaretskii, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Andreas Schwab, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, pertusus, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces,
pertusus <=
- Re: index sorting in texi2any in C issue with spaces, Gavin Smith, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Eli Zaretskii, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Patrice Dumas, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Patrice Dumas, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Werner LEMBERG, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Gavin Smith, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Patrice Dumas, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Werner LEMBERG, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Gavin Smith, 2024/02/04
- Re: index sorting in texi2any in C issue with spaces, Patrice Dumas, 2024/02/04