[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: texi2any is too slow because of Unicode::Collate
From: |
Patrice Dumas |
Subject: |
Re: texi2any is too slow because of Unicode::Collate |
Date: |
Sat, 11 Feb 2023 20:04:15 +0100 |
On Sat, Feb 11, 2023 at 05:04:45PM +0000, Gavin Smith wrote:
> I found that texi2any was significantly slower than expected on a
> manual.
>
> I ran with Devel::NYTProf:
>
> TEXINFO_XS=require perl -d:NYTProf ../texi2any.pl
> ../../../emacs-lispref-27.2/elisp.texi
>
> The flame graph output shows that 75% of the execution time is spent in
> Texinfo::Convert::Info::format_printindex, and 70% within
> Unicode::Collate::cmp. Here are the top functions:
>
>
> Top 15 Subroutines
> Calls P F Exc Inc Subroutine
> 2280071 1 1 23.1s 25.5s Unicode::Collate::getWt
> 122770 1 1 14.4s 15.6s Unicode::Collate::splitEnt
> 351998 22 1 7.86s 67.2s Texinfo::Convert::Plaintext::_convert
> 122770 1 1 6.86s 48.8s Unicode::Collate::getSortKey
> 270366 28 1 1.52s 1.59s Texinfo::Convert::Plaintext::_count_added
> 2280071 1 1 973ms 973ms Unicode::Collate::varCE (xsub)
> 167542 1 1 899ms 1.26s Texinfo::Convert::Plaintext::_process_text
> 184832 8 2 842ms 842ms Texinfo::Convert::Paragraph::add_text (xsub)
> 2280071 1 1 724ms 724ms Unicode::Collate::_fetch_simple (xsub)
> 2280071 1 1 550ms 550ms Unicode::Collate::_ignorable_simple (xsub)
> 4564446 8 1 530ms 530ms Unicode::Collate::CORE:match (opcode)
> 2280071 1 1 508ms 508ms Unicode::Collate::_exists_simple (xsub)
> 62010 1 1 463ms 49.7s Texinfo::Structuring::_collator_sort_string
> 122770 1 1 444ms 622ms Unicode::Collate::process
> 1 1 1 434ms 434ms Texinfo::Parser::parse_file (xsub)
>
>
> Can we avoid using Unicode::Collate as much?
Maybe a possibility could be to split by letter and sort each letter, as
done in HTML. I have no idea if it would be really faster, it could as
the time may be more than proportional to the length of words/number of
words.
Other than that I do not have much other idea than disabling it, for
instance if documentlanguage is en. The result with Unicode::Collate is
better for accented letters, but not so useful in english. There could
even be a customization variable to use Unicode::Collate even in
english.
--
Pat
- texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate,
Patrice Dumas <=
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Eli Zaretskii, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, pertusus, 2023/02/11
- Re: texi2any is too slow because of Unicode::Collate, Eli Zaretskii, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, pertusus, 2023/02/12
- Re: texi2any is too slow because of Unicode::Collate, Gavin Smith, 2023/02/12
- texi2any 7.0 performance regression (non-XS), Gavin Smith, 2023/02/12
- Re: texi2any 7.0 performance regression (non-XS), pertusus, 2023/02/12