Re: different encodings for input and output file names and command line

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: different encodings for input and output file names and command line

From:	Patrice Dumas
Subject:	Re: different encodings for input and output file names and command line
Date:	Wed, 2 Mar 2022 22:03:01 +0100

On Wed, Mar 02, 2022 at 07:18:48PM +0000, Gavin Smith wrote:
> On Wed, Mar 02, 2022 at 10:42:34AM +0100, Patrice Dumas wrote:
> > I just tested that it does not happen with the perl Parser.  I let you
> > look in more details, maybe the issue is that top->source_info is not
> > current_source_info at the point of the line directive.
> 
> It's actually nothing to do with being in a node or not, it's due to
> the #line directive occurring so soon after '\input texinfo' that it
> is read while collecting the 'preamble_before_beginning' element, and
> added back to the input stream as text (like an expanded macro), so
> the source line information is attached to this added text, not to
> the input file.

I get it. 

> (I expect the Perl code works because the $source_info variable in
> _parse_texi_document refers to a shared hash, but I haven't checked this.)

I do not really understand why it is not the same for
the perl Parser, I tried to make the two implementation very similar in
that part...

> I had thought of moving the checking of #line directives deeper into the
> input code, but they don't work inside certain blocks, like @verbatim.
> 
> I tried tweaking the code to avoid fetching the extra line after
> \input in the first place by stopping after this line is seen:
> 
> - but this led to extra blank lines being output in many output formats,
> which I didn't think was worth doing.

It would indeed be better if it could be avoided.

> I fixed this in the easiest way I could think of, which was to add a
> pushback field for one line only (commit 5c942cdc7a).

This looks ok, the problem I see is that it adds an if that will only be
true once but will be tested for all the lines, but it is not such a big
deal.

> I believe you had changes ongoing for the "file prelude" so getting rid
> of the need for the 'preamble_before_beginning' element could be a good
> part of this.

The changes I have ongoing are not really for that part.

> Another way might be to add special input code to trim off and return
> a file prelude.  This would moves the handling of this from the "parser" code
> to the "input" code.  This would avoid the problematic "pushing back" of input
> and would be a clean way of doing this.  It would isolate the handling of
> the "\input" line from the other parsing code.
> 
> I understand that the main purpose of the preamble_before_beginning element
> is not to lose information so that the original Texinfo file could be
> regenerated.  If that's the case, maybe the input code could return
> all the text in this preamle as one long string - it wouldn't have to be
> line by line.
> 
> This would require more changes though.  I didn't want to put much more
> work into this if the code could change anyway.

This looks good to me for now, I will add your two above paragraphs
to the tp/TODO file and resync the perl parser with your changes.

-- 
Pat

[Prev in Thread]

Current Thread

[Next in Thread]

Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/01
- Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
  - Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/02
    - Re: different encodings for input and output file names and command line, Patrice Dumas <=
    - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
- Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/01
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/02
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/03
    - Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/03
- Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/03
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/04
    - Re: different encodings for input and output file names and command line, Gavin Smith, 2022/03/04
    - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/06
  - Re: different encodings for input and output file names and command line, Patrice Dumas, 2022/03/04

Prev by Date: Re: different encodings for input and output file names and command line
Next by Date: Re: different encodings for input and output file names and command line
Previous by thread: Re: different encodings for input and output file names and command line
Next by thread: Re: different encodings for input and output file names and command line
Index(es):
- Date
- Thread