texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/Texinfo/Convert/HTML.pm (_css_string_accent)


From: Patrice Dumas
Subject: branch master updated: * tp/Texinfo/Convert/HTML.pm (_css_string_accent) (_simplify_text_for_comparison, _default_format_element_footer): use Unicode properties and character classes that match non ascii letters and spaces when in regex where this is what is relevant and not ascii text only.
Date: Fri, 19 Aug 2022 18:13:41 -0400

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 4fa03abd72 * tp/Texinfo/Convert/HTML.pm (_css_string_accent) 
(_simplify_text_for_comparison, _default_format_element_footer): use Unicode 
properties and character classes that match non ascii letters and spaces when 
in regex where this is what is relevant and not ascii text only.
4fa03abd72 is described below

commit 4fa03abd7278d888b0a56441f0b0559c7004e281
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sat Aug 20 00:13:30 2022 +0200

    * tp/Texinfo/Convert/HTML.pm (_css_string_accent)
    (_simplify_text_for_comparison, _default_format_element_footer):
    use Unicode properties and character classes that match non
    ascii letters and spaces when in regex where this is what is relevant
    and not ascii text only.
---
 ChangeLog                  | 8 ++++++++
 tp/Texinfo/Convert/HTML.pm | 8 +++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3f999d81a9..8b09364e93 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2022-08-19  Patrice Dumas  <pertusus@free.fr>
+
+       * tp/Texinfo/Convert/HTML.pm (_css_string_accent)
+       (_simplify_text_for_comparison, _default_format_element_footer):
+       use Unicode properties and character classes that match non
+       ascii letters and spaces when in regex where this is what is relevant
+       and not ascii text only.
+
 2022-08-19  Patrice Dumas  <pertusus@free.fr>
 
        Use gnulib wcwidth in tp/Texinfo/XS/xspara.c
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index af7fd4565c..1e649e03f0 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -2968,7 +2968,7 @@ sub _css_string_accent($$$;$)
 
   my $accent = $command->{'cmdname'};
 
-  if ($in_upper_case and $text =~ /^\w$/) {
+  if ($in_upper_case and $text =~ /^\p{Word}$/) {
     $text = uc ($text);
   }
   if (exists($Texinfo::Convert::Unicode::unicode_accented_letters{$accent})
@@ -5634,7 +5634,7 @@ $default_css_string_types_conversion{'text'} = 
\&_css_string_convert_text;
 sub _simplify_text_for_comparison($)
 {
   my $text = shift;
-  $text =~ s/[^\w]//g;
+  $text =~ s/[^\p{Word}]//g;
   return $text;
 }
 
@@ -6364,7 +6364,9 @@ sub _default_format_element_footer($$$$)
       if ($self->get_conf('HEADERS')) {
         my $no_footer_word_count;
         if ($self->get_conf('WORDS_IN_PAGE')) {
-          my @cnt = split(/\W*\s+\W*/, $content);
+          # FIXME it seems that NO-BREAK SPACE and NEXT LINE (NEL) may
+          # not be in \h and \v in some case, but not sure which case it is
+          my @cnt = split(/\P{Word}*[\h\v]+\P{Word}*/, $content);
           if (scalar(@cnt) < $self->get_conf('WORDS_IN_PAGE')) {
             $no_footer_word_count = 1;
           }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]