Re: different encodings for input and output file names and command line

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: different encodings for input and output file names and command line

From:	Gavin Smith
Subject:	Re: different encodings for input and output file names and command line
Date:	Wed, 2 Mar 2022 19:18:48 +0000

On Wed, Mar 02, 2022 at 10:42:34AM +0100, Patrice Dumas wrote:
> I just tested that it does not happen with the perl Parser.  I let you
> look in more details, maybe the issue is that top->source_info is not
> current_source_info at the point of the line directive.

It's actually nothing to do with being in a node or not, it's due to
the #line directive occurring so soon after '\input texinfo' that it
is read while collecting the 'preamble_before_beginning' element, and
added back to the input stream as text (like an expanded macro), so
the source line information is attached to this added text, not to
the input file.

I believe the #line directive only applies to the topmost input file,
so there should usually be no need to apply it to lower down entries
on the input stack.  Moreover the directive should only be coming from
input files, not from expanded macros, so while the following fixes the bug:

diff --git a/tp/Texinfo/XS/parsetexi/input.c b/tp/Texinfo/XS/parsetexi/input.c
index 725f0881b0..0cd2359612 100644
--- a/tp/Texinfo/XS/parsetexi/input.c
+++ b/tp/Texinfo/XS/parsetexi/input.c
@@ -308,11 +308,18 @@ save_line_directive (int line_nr, char *filename)
 {
   char *f = 0;
   INPUT *top;
+  int i;

   if (filename)
     f = encode_file_name (filename);

-  top = &input_stack[input_number - 1];
+  /* skip added text */
+  i = input_number - 1;
+  while (i >= 0 && input_stack[i].type == IN_text)
+    i--;
+  if (i < 0)
+    return;
+  top = &input_stack[i];
   if (line_nr)
     top->source_info.line_nr = line_nr;
   if (filename)

it's highly confusing as the code will only be needed in this one
special case.  (Later I realised it might also cause problems with the
test suite and would have to work for text input too.)

(I expect the Perl code works because the $source_info variable in
_parse_texi_document refers to a shared hash, but I haven't checked this.)

I had thought of moving the checking of #line directives deeper into the
input code, but they don't work inside certain blocks, like @verbatim.

I tried tweaking the code to avoid fetching the extra line after
\input in the first place by stopping after this line is seen:

diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 113d51e101..91261e32a6 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -1043,10 +1043,13 @@ sub _parse_texi_document($)
   my $preamble_before_beginning;
   while (1) {
     my $line;
+    my $is_input_line;
     ($line, $source_info) = _next_text($self, $source_info);
     last if (!defined($line));
+
+    $is_input_line = ($line =~ /^ *\\input/);
     # non ascii spaces do not start content
-    if ($line =~ /^ *\\input/ or $line =~ /^\s*$/) {
+    if ($is_input_line or $line =~ /^\s*$/) {
       if (not defined($preamble_before_beginning)) {
         $preamble_before_beginning = {'type' => 'preamble_before_beginning',
                         'contents' => [], 'parent' => $before_node_section };
@@ -1062,6 +1065,7 @@ sub _parse_texi_document($)
       unshift @{$self->{'input'}->[0]->{'pending'}}, [$line, $source_info];
       last;
     }
+    last if $is_input_line;
   }

   my $tree = $self->_parse_texi($document_root, $before_node_section);

- but this led to extra blank lines being output in many output formats,
which I didn't think was worth doing.

I fixed this in the easiest way I could think of, which was to add a
pushback field for one line only (commit 5c942cdc7a).

I believe you had changes ongoing for the "file prelude" so getting rid
of the need for the 'preamble_before_beginning' element could be a good
part of this.

Another way might be to add special input code to trim off and return
a file prelude.  This would moves the handling of this from the "parser" code
to the "input" code.  This would avoid the problematic "pushing back" of input
and would be a clean way of doing this.  It would isolate the handling of
the "\input" line from the other parsing code.

I understand that the main purpose of the preamble_before_beginning element
is not to lose information so that the original Texinfo file could be
regenerated.  If that's the case, maybe the input code could return
all the text in this preamle as one long string - it wouldn't have to be
line by line.

This would require more changes though.  I didn't want to put much more
work into this if the code could change anyway.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/01
- Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
  - Re: different encodings for input and output file names and command line, Gavin Smith <=
    - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
    - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
- Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/01
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/03
    - Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/03
- Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/03
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/04
    - Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/04
    - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/06

Prev by Date: Re: different encodings for input and output file names and command line
Next by Date: Re: different encodings for input and output file names and command line
Previous by thread: Re: different encodings for input and output file names and command line
Next by thread: Re: different encodings for input and output file names and command line
Index(es):
- Date
- Thread