bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Preparation for use of XS paragraph formatting module


From: Gavin Smith
Subject: Preparation for use of XS paragraph formatting module
Date: Mon, 29 Jun 2015 18:27:53 +0100

Hi Patrice and anyone else who cares to comment,

As you may know I've been rewriting Paragraph.pm, the formatter module
for paragraphs, in C, to be used as a loadable XS module by Perl. Due
to Perl's slow text processing capabilities, paragraph formatting
takes up a sizable proportion of the run-time of makeinfo/texi2any
when outputting an Info file.

For comparison, here's the timing of a run using the Perl Paragraph.pm
on the sources of the Emacs Lisp manual (about 3.3 megs of Texinfo
source):

real    0m54.751s
user    0m46.124s
sys     0m0.266s

Now using the C replacement:

real    0m34.367s
user    0m29.865s
sys     0m0.267s

Although not complete, I don't expect these kinds of numbers to change
very much.

I hope that this XS module can be completed and integrated into
texi2any. If that can be done, it should be possible to replace other
parts of texi2any as well for speed (notably the parser module, which
is a much bigger job to rewrite).

I'm at the stage now where the choice of whether to use Paragraph.pm
or the XS module (I've called it XSParagraph) is made by commenting
out a single line in Plaintext.pm. This works for running texi2any.pl
from within the source directory: there will be more problems for
installing/distributing/etc. (probably needs libtool or something).

In order to make this possible, I've made preparatory changes to the
Perl modules, which I am attaching here for review. The changes relate
to the question of whether there should be one space after a full
stop, or two.

As you know, a capital letter before a full stop suppresses an end of
sentence. There is a complication with constructs like "@sc{a. b.}"
which should give the output "A.  B." and not "A. B.". Currently
texi2any deals with this with a concept of "underlying text": when
formatting "A. B." it looks at a string like "a. b." to decide if it
is at the end of a sentence.

I've found this use of underlying text hard to understand when reading
the code. I didn't want to write the C code to process underlying text
along with the main text, and also there may be performance
implications in doing things twice. So I've changed the code to use a
different approach. This is to insert a marker character, that will
not appear in the output, before a ., ? or ! which is allowed to
terminate a sentence in spite of a preceding upper-case letter. This
might seem like a hack, but it won't cause any problems because the
marker character used won't be passed in the argument otherwise, and
it was easy to implement the interpretation of this in XSParagraph.

I acknowledge that this is a big patch to look at. The most
interesting part of it is the changes to Plaintext.pm, which
demonstrates the interface that the formatter modules now provide. If
anyone has time to have a look at this, or suggest what I'm missing,
it would be appreciated.

"make check" reports 2 failures with these changes, both for tests
which used add_underlying_text directly. When I switch to XSParagraph,
I get 3 failures: the 2 mentioned, plus one that had accent combining
characters in the output, which Paragraph.pm was assuming had width 1
(there were included in Perl code like "length($word)"), when actually
they had display width 0, leading to a line being wrapped differently.
Output looks like:

   *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j
ee͡:: *note ı Ḕ

when it should be

   *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j ee͡::
*note ı Ḕ Ḉ

(Don't know how these will show up in the email...) This was in
t/results/converters_tests/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
and 
t/results/converters_tests/at_commands_in_refs_utf8/out_info/at_commands_in_refs_utf8.info

I'd like to make these changes now, although I will need to do more
work and testing on XSParagraph before it can be enabled by default.

Best wishes,
Gavin

Attachment: prepare-for-xsparagraph.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]