bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: not using OUTPUT_PERL_ENCODING to determine if _stream_encode should


From: Gavin Smith
Subject: Re: not using OUTPUT_PERL_ENCODING to determine if _stream_encode should encode
Date: Sun, 18 Feb 2024 18:09:23 +0000

On Sun, Feb 18, 2024 at 02:13:13PM +0100, Patrice Dumas wrote:
> Hello,
> 
> This is in the new encode once only stream code, so I ask here before
> making a change, I could be missing something, Gavin you probably
> will have comments/advices.
> 
> I do not think that it is a good thing to set OUTPUT_PERL_ENCODING to an
> empty string to prevent _stream_encode to encode if a character string
> is wanted, as is done in test_utils.pl in convert_to_plaintext if
> convert() is called on a Plaintext converter (same for Info).  I think
> that it should be another information, set in the converter if called
> through convert.  For two reasons, first OUTPUT_PERL_ENCODING
> information could still be needed later on when outputting to a file
> (this is the case in test_utils.pl) and second because convert() should
> always return a character strings independentely of any customization
> information.

What you propose sounds reasonable.  I understand the idea of 'convert'
always returning an unencoded string regardless of output format, although
the Info converter will have incorrect tag tables so is not a widely
useful function.  It seems that 'convert' is only used for the test suite,
am I right?

> 
> A more minor issue, if $self->{'encoding_object'} is not set, it will
> be retested for 'output_perl_encoding' value each time _stream_encode is
> called.  Maybe it would be better to set it to 0 or to undef and test if
> (!exists($self->{'encoding_object'})).

I don't like the possibility of having $self->{'encoding_object'} being
either an integer (0) or the result of Encode::find_encoding.  I also
don't like relying on the distinction between "defined" and "true"
(0 is defined but not true).  The semantics of "exists" but undefined
is easy to forget and relying on this would make the code more obscure.

Does the following look ok?

diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index bd7308f783..d944708c86 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -956,23 +956,24 @@ sub _stream_encode($$)
   my $self = shift;
   my $string = shift;
 
+  if ($self->{'encoding_disabled'}) {
+    return $string;
+  }
+
   if (!defined($self->{'encoding_object'})) {
     my $encoding = $self->{'output_perl_encoding'};
-    if ($encoding and $encoding ne 'ascii') {
-      my $Encode_encoding_object = Encode::find_encoding($encoding);
-      if (!defined($Encode_encoding_object)) {
-        Carp::croak "Unknown encoding '$encoding'";
-      }
-      $self->{'encoding_object'} = $Encode_encoding_object;
+    if (!$encoding or $encoding eq 'ascii') {
+      $self->{'encoding_disabled'} = 1;
+      return $string;
     }
+    my $Encode_encoding_object = Encode::find_encoding($encoding);
+    if (!defined($Encode_encoding_object)) {
+      Carp::croak "Unknown encoding '$encoding'";
+    }
+    $self->{'encoding_object'} = $Encode_encoding_object;
   }
 
-  if ($self->{'encoding_object'}) {
-    my $encoded = $self->{'encoding_object'}->encode($string);
-    return $encoded;
-  } else {
-    return $string;
-  }
+  return $self->{'encoding_object'}->encode($string);
 }
 


If this is ok, then "convert" could set $self->{'encoding_disabled'}.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]