bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preparation for use of XS paragraph formatting module


From: Patrice Dumas
Subject: Re: Preparation for use of XS paragraph formatting module
Date: Mon, 29 Jun 2015 21:24:44 +0200
User-agent: Mutt/1.5.20 (2009-12-10)

On Mon, Jun 29, 2015 at 06:27:53PM +0100, Gavin Smith wrote:
> Hi Patrice and anyone else who cares to comment,
> 
> For comparison, here's the timing of a run using the Perl Paragraph.pm
> on the sources of the Emacs Lisp manual (about 3.3 megs of Texinfo
> source):
> 
> real    0m54.751s
> user    0m46.124s
> sys     0m0.266s
> 
> Now using the C replacement:
> 
> real    0m34.367s
> user    0m29.865s
> sys     0m0.267s

That's good!

> As you know, a capital letter before a full stop suppresses an end of
> sentence. There is a complication with constructs like "@sc{a. b.}"
> which should give the output "A.  B." and not "A. B.". Currently
> texi2any deals with this with a concept of "underlying text": when
> formatting "A. B." it looks at a string like "a. b." to decide if it
> is at the end of a sentence.
> 
> I've found this use of underlying text hard to understand when reading
> the code. 

It was meant to be simple, if not efficient nor smart, as it is simply 
an accumulation of the text without formatting.

> I didn't want to write the C code to process underlying text
> along with the main text, and also there may be performance
> implications in doing things twice. So I've changed the code to use a
> different approach. This is to insert a marker character, that will
> not appear in the output, before a ., ? or ! which is allowed to
> terminate a sentence in spite of a preceding upper-case letter. This
> might seem like a hack, but it won't cause any problems because the
> marker character used won't be passed in the argument otherwise, and
> it was easy to implement the interpretation of this in XSParagraph.

I do not like that much that kind of tricks, but if it works...  Also,
the Paragraph.pm, Line.pm and Unfilled.pm were meant to be independent
with a (well) defined API.  With this change, it seems that the calling
code could need to know that there is a specific marker character that
may be inserted.

> I acknowledge that this is a big patch to look at. The most
> interesting part of it is the changes to Plaintext.pm, which
> demonstrates the interface that the formatter modules now provide. If
> anyone has time to have a look at this, or suggest what I'm missing,
> it would be appreciated.

I didn't see anything obvious, but I didn't really understood it either.

> "make check" reports 2 failures with these changes, both for tests
> which used add_underlying_text directly. When I switch to XSParagraph,
> I get 3 failures: the 2 mentioned, plus one that had accent combining
> characters in the output, which Paragraph.pm was assuming had width 1
> (there were included in Perl code like "length($word)"), when actually
> they had display width 0, leading to a line being wrapped differently.
> Output looks like:
> 
>    *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
> Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j
> ee͡:: *note ı Ḕ
> 
> when it should be
> 
>    *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
> Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j ee͡::
> *note ı Ḕ Ḉ

Here it seems that the new output is better?

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]