bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: texinfo in Google Summer of Code


From: Gavin Smith
Subject: Re: texinfo in Google Summer of Code
Date: Wed, 10 Feb 2021 21:40:21 +0000
User-agent: Mutt/1.9.4 (2018-02-28)

On Wed, Feb 10, 2021 at 12:15:39PM -0800, Per Bothner wrote:
> On 2/10/21 1:07 AM, Gavin Smith wrote:
> > For the long term health of the Texinfo project, more people do need
> > to understand this Perl code. Nobody has come forward to write new
> > back-ends to generate other output formats (LaTeX, Org Mode,
> > MediaWiki...). I don't think it's much to do with the fact it's
> > written in Perl.
> 
> It doesn't help - Perl was never all that popular, and is less so than
> it used to be.  I never really learned it, though I did figure out how
> to make some changes to the code.

Speaking from personal experience, my biggest issue with Perl is not,
as people often say, the syntax: it is the lack of types.  It makes
it harder to understand what code it supposed to do when everything
is a "scalar" type that could be everything from a number, a character
string, or a complex data structure.

I do not think that Perl the best choice for a large complex program
like this one, both for speed and comprehensibility, but it's completely
impossible at this stage to move away from it.

texi2any is split into modules, each only a few thousand lines of code,
and I don't think it's impossible for new people to make sense of
individual modules.

> I recently (and belatedly) converted the DomTerm backend from C to C++,
> though I've only started to make use of C++ functionality.  That is one
> advantage of C++, especially given an existing C codebase: You can
> incrementally C++-ify and refactor it.  That would have been a better path
> for makeinfo - but of course it all depends on what volunteers want to do.
> However, I think more people know C++ than Perl and I think it is easier
> to write efficient readable code.
> 
> However, that's water under the bridge, unless someone wants to
> resurrect the old makeinfo C code and migrate it to C++, preferably
> learning from the Perl code in designing the C++ classes.

There's no point as the old code had a completely different design.

I have actually warmed to C++ lately, especially with the Qt library.
It really simplifies memory handling.  The number one problem is with
C++ is that the compile times are awful.

> > I have some ideas what the problems could be
> > but would like to hear what others think.

I think some of the functions in ParserNonXS.pm would benefit from
being split up into smaller functions (as I did in the C rewrite),
although this would hurt performance due to the high cost of a
function call in Perl.

The other issue is that the code operates on a large tree structure
which represents the document, which is shared state for all the
code that operates on it.  It's sometimes hard to know where you
are in the tree and what other parts of the code are supposed to
have done to it.  I'm not sure if there is any solution to this.
The parser is full of code like

  $current = $current->{'parent'};

which moves up in the parse tree.  I am not sure if there is
any hope for documenting which bit of the tree you are in
using some kind of run-time type checking.  All that is done
in the C code is to use a single ELEMENT type for all tree
elements.  Does anybody know if this issue is dealt with well
in other programs that operate on parse trees, like compilers?

> > Did you try to understand the code and fail?  What were the difficulties?
> > Could the code be changed to make it more accessible?
> 
> I was able to make some local changes, and add/change the generated html
> in places.  However, some more complicated things I couldn't figure out
> in the time I spent on it.  The control flow with the table-driven
> processing was confusing, especially for anyone not fluent in Perl.

That is useful to know.  I guess this is the HTML part of it you are
talking about here.  There is quite a lot of indirection in this code.
For example, to see what the code is for the @vtable command, you
search for 'vtable' in HTML.pm and you see there is a line

$default_commands_conversion{'vtable'} = \&_convert_xtable_command;

along with the definition of _convert_xtable_command.  However, it's
harder to find out when and how this is ever used.  The
%default_commands_conversion hash is copied elsewhere in the middle of
this block of code:

  foreach my $command (keys(%misc_commands), keys(%brace_commands),
     keys (%block_commands), keys(%no_brace_commands), 'value') {
    if (exists($Texinfo::Config::texinfo_commands_conversion{$command})) {
      $self->{'commands_conversion'}->{$command}
          = $Texinfo::Config::texinfo_commands_conversion{$command};
    } else {
      if ($self->get_conf('FORMAT_MENU') ne 'menu'
           and ($command eq 'menu' or $command eq 'detailmenu')) {
        $self->{'commands_conversion'}->{$command} = undef;
      } elsif ($format_raw_commands{$command}
               and !$self->{'expanded_formats_hash'}->{$command}) {
      } elsif (exists($default_commands_conversion{$command})) {
        $self->{'commands_conversion'}->{$command}
           = $default_commands_conversion{$command};
        if ($command eq 'menu' and $self->get_conf('SIMPLE_MENU')) {
          $self->{'commands_conversion'}->{$command}
            = $default_commands_conversion{'example'};
        }
      }
    }
  }

and that is perhaps hard to follow.  I would have to think if it
could be simplified at all, but it might not be possible.

To take another example, in a few places in the code
there are calls to output the footnotes like

    my $foot_text = &{$self->{'format_footnotes_text'}}($self);

However, searching the code for the string "format_footnotes_text"
doesn't show it being set anywhere, only used.  It's a bit of a
mystery to try to work this out (the "format_footnotes_text" is
set after concatenating the two strings "format_" and
"footnotes_text").




reply via email to

[Prev in Thread] Current Thread [Next in Thread]