bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: support figure space (U+2007)


From: Gavin Smith
Subject: Re: support figure space (U+2007)
Date: Fri, 14 Jan 2022 17:57:44 +0000
User-agent: Mutt/1.9.4 (2018-02-28)

On Mon, Jan 03, 2022 at 10:22:36AM +0100, Patrice Dumas wrote:
> On Sun, Jan 02, 2022 at 08:20:18AM +0000, Werner LEMBERG wrote:
> > 
> > The nice thing is that it would work out of the box with HTML
> > browsers, too.  On the other hand, maybe there could be some further
> > massaging to convert U+2007 to an ordinary space entity together with
> > some formatting CSS so that cut-and-paste don't contain U+2007.
> 
> I think that we should simply leave the unicode character or entity as
> is in HTML and let bowsers and users handle it.

I've tested it and the space is maintained for Info output, but in
HTML output is removed.

\input texinfo

@multitable {999999999} {xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}
@item    55 @tab explanation
@item   535 @tab explanation
@item 49303 @tab explanation
@end multitable


@bye

(They should be figure spaces in the example here.)

With the current code this outputs simply

<table class="multitable">
<tbody><tr><td>55</td><td>explanation</td></tr>
<tr><td>535</td><td>explanation</td></tr>
<tr><td>49303</td><td>explanation</td></tr>
</tbody>
</table>

Now this can be changed with the following patch

diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 0426253d06..e67f127253 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -4260,8 +4260,8 @@ sub _convert_tab_command ($$$$)
     }
   }
 
-  $content =~ s/^\s*//;
-  $content =~ s/\s*$//;
+  $content =~ s/^[^\S\x{2007}]*//;
+  $content =~ s/[^\S\x{2007}]*$//;
 
   if ($self->in_string()) {
     return $content;


which exempts the figure space from whitespace trimming in @multitable.

However this is ugly and confusing and it's not clear that there aren't
other special spaces that should also be exempted.
According to Perl documentation

https://perldoc.perl.org/perlre#/a-(and-/aa)
https://perldoc.perl.org/perlrecharclass#Whitespace

the /a "flag" can also be used to limit the character class to ASCII
space characters:

diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 0426253d06..7df5e6e955 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -4260,8 +4260,8 @@ sub _convert_tab_command ($$$$)
     }
   }
 
-  $content =~ s/^\s*//;
-  $content =~ s/\s*$//;
+  $content =~ s/^\s*//a;
+  $content =~ s/\s*$//a;
 
   if ($self->in_string()) {
     return $content;

Are there any comments before I commit this?  This fixes this one case
but there is widespread use of \s (and \w, \S, \W) throughout the code.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]