bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: texi2any is too slow because of Unicode::Collate


From: Gavin Smith
Subject: Re: texi2any is too slow because of Unicode::Collate
Date: Sat, 11 Feb 2023 19:46:12 +0000

On Sat, Feb 11, 2023 at 08:04:15PM +0100, Patrice Dumas wrote:
> On Sat, Feb 11, 2023 at 05:04:45PM +0000, Gavin Smith wrote:
> > I found that texi2any was significantly slower than expected on a
> > manual.
> > 
> > I ran with Devel::NYTProf:
> > 
> > TEXINFO_XS=require perl -d:NYTProf ../texi2any.pl 
> > ../../../emacs-lispref-27.2/elisp.texi
> > 
> > The flame graph output shows that 75% of the execution time is spent in
> > Texinfo::Convert::Info::format_printindex, and 70% within
> > Unicode::Collate::cmp.  Here are the top functions:
> > 
> > 
> > Top 15 Subroutines
> > Calls   P  F Exc   Inc      Subroutine
> > 2280071 1  1 23.1s 25.5s    Unicode::Collate::getWt
> > 122770  1  1 14.4s 15.6s    Unicode::Collate::splitEnt
> > 351998  22 1 7.86s 67.2s    Texinfo::Convert::Plaintext::_convert
> > 122770  1  1 6.86s 48.8s    Unicode::Collate::getSortKey
> > 270366  28 1 1.52s 1.59s    Texinfo::Convert::Plaintext::_count_added
> > 2280071 1  1 973ms 973ms    Unicode::Collate::varCE (xsub)
> > 167542  1  1 899ms 1.26s    Texinfo::Convert::Plaintext::_process_text
> > 184832  8  2 842ms 842ms    Texinfo::Convert::Paragraph::add_text (xsub)
> > 2280071 1  1 724ms 724ms    Unicode::Collate::_fetch_simple (xsub)
> > 2280071 1  1 550ms 550ms    Unicode::Collate::_ignorable_simple (xsub)
> > 4564446 8  1 530ms 530ms    Unicode::Collate::CORE:match (opcode)
> > 2280071 1  1 508ms 508ms    Unicode::Collate::_exists_simple (xsub)
> > 62010   1  1 463ms 49.7s    Texinfo::Structuring::_collator_sort_string
> > 122770  1  1 444ms 622ms    Unicode::Collate::process
> > 1       1  1 434ms 434ms    Texinfo::Parser::parse_file (xsub)
> > 
> > 
> > Can we avoid using Unicode::Collate as much?
> 
> Maybe a possibility could be to split by letter and sort each letter, as
> done in HTML.  I have no idea if it would be really faster, it could as
> the time may be more than proportional to the length of words/number of
> words.
> 
> Other than that I do not have much other idea than disabling it, for
> instance if documentlanguage is en.  The result with Unicode::Collate is
> better for accented letters, but not so useful in english.  There could
> even be a customization variable to use Unicode::Collate even in
> english.

I think it's a good idea to disable it for "en" at least, along with
a customization variable.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]