bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

"special" spaces in Texinfo parsing and output


From: Patrice Dumas
Subject: "special" spaces in Texinfo parsing and output
Date: Sat, 23 Mar 2013 14:06:56 +0100
User-agent: Mutt/1.5.20 (2009-12-10)

Hello,

The issue reported by Ludovic lead me to look at how space characters
are handled in general in the perl makeinfo implementation.  The \s (and
\S, which is simply the complement of \s) character class is used all
over, in the Parser and the converters.  The Parser should not remove
any character (or maybe in very specific circumstances with user errors,
not worth looking at), however, the interpretation of spaces is
important in constructing the tree, for instance delineating paragraphs,
spaces after commands...  In most converters, the spaces are all output, 
such that input space characters are kept as is.  But in Plaintext/Info
spaces are removed as part of paragraph/lines formatting, also lines
consisting only of spaces are emptied in @example and the like and lines
consisting only of spaces between paragraphs are completly removed.

Now, what is in \s?  It turns out that it is not that simple.  It is
explained in
http://perldoc.perl.org/perlrecharclass.html#Backslash-sequences
in the 'Whitespace' part.  The smallest set is [\t\n\f\r ], which
includes the '^L' character (\f).  But depending on the setting, there
may be additional characters, like '0x2000 EN QUAD'.  I have tested that
all those appears in html output, but none in Info (except for LINE
TABULATION) with @documentencoding utf-8.  I attach the file.

So, this means that all those spaces except for LINE TABULATION have
their special meaning not kept.  I think that what should be nice would
be to have both something sensible and consistent with TeX/LaTeX, having
something sensible coming first.  It seems that that the makeinfo in C
considered explicitly spaces to be something along [\r\n\t ].

So, what should be done?  Do something different for parsing or is
it ok to have all the space like characters be considered as spaces?
And for the output?  Break words only at [\r\n\t ]?  Keep the first 
space character only if it is not [\r\n]?





As a side note, when trying all the spaces advertised on the perl 
documentation, without @documentencoding, the result is messed up 
because of unicode, certainly. If @documentencoding us-ascii is used, 
the result is not pretty (though this has not much to do with 
spaces) perl complains (rightly), although we may want to catch that to 
give another error message:
ascii "\xA0" does not map to Unicode at Texinfo/Parser.pm line 1909, <FH> line 
1.

-- 
Pat

Attachment: test_spaces.texi
Description: TeXInfo document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]