[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: texi2any is too slow because of Unicode::Collate
From: |
Gavin Smith |
Subject: |
Re: texi2any is too slow because of Unicode::Collate |
Date: |
Sat, 11 Feb 2023 20:30:07 +0000 |
On Sat, Feb 11, 2023 at 10:02:55PM +0200, Eli Zaretskii wrote:
> > From: Gavin Smith <gavinsmith0123@gmail.com>
> > Date: Sat, 11 Feb 2023 19:46:12 +0000
> >
> > On Sat, Feb 11, 2023 at 08:04:15PM +0100, Patrice Dumas wrote:
> > > Other than that I do not have much other idea than disabling it, for
> > > instance if documentlanguage is en. The result with Unicode::Collate is
> > > better for accented letters, but not so useful in english. There could
> > > even be a customization variable to use Unicode::Collate even in
> > > english.
> >
> > I think it's a good idea to disable it for "en" at least, along with
> > a customization variable.
>
> How many manuals set documentlanguage? With the proliferation of
> documentencoding set to UTF-8, I think disabling the collation for
> "en" will be next to futile.
If I understand correctly, until recently more standard Perl facilities
were used for sorting the indices, but this produced worse results for
non-English text, such as that containing many accented characters.
Unicode::Collate is used to sort the indices "properly". Use of UTF-8
may not be a relevant factor.
Could we investigate further which languages it causes a problem for?
The old method might be okay for more languages than just English.
> How come format_printindex takes such a large proportion of the
> processing? Isn't that strange? Index entries are usually a small
> proportion of the overall manual's text, so processing the manual
> should take the lion's share. The index in the manual you were timing
> has about 8K entries, but the entire manual is 100K lines, so the
> index is less than 10% of the total volume. How come its processing
> is so expensive?
It's the sorting of the index entries into alphabetical order, I presume.
There isn't a similar sorting process for the rest of the manual.
- texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Patrice Dumas, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Eli Zaretskii, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate,
Gavin Smith <=
- Re: texi2any is too slow because of Unicode::Collate, pertusus, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Eli Zaretskii, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, pertusus, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/12
- texi2any 7.0 performance regression (non-XS), Gavin Smith, 2023/02/12
- Re: texi2any 7.0 performance regression (non-XS), pertusus, 2023/02/12
- Re: texi2any 7.0 performance regression (non-XS), Gavin Smith, 2023/02/12
- Re: texi2any 7.0 performance regression (non-XS), pertusus, 2023/02/12
- Obsolete XS overrides are gone, Gavin Smith, 2023/02/27