Re: simplifying configuration of encoded characters/entities output

From: Gavin Smith
Subject: Re: simplifying configuration of encoded characters/entities output
Date: Wed, 29 Dec 2021 15:36:23 +0000
On Wed, Dec 29, 2021 at 03:31:49PM +0100, Patrice Dumas wrote:
> I proposed DISABLE_PUNCTUATION_ENCODING in a mail I just sent in the
> old thread.
> I will reproduce here what I said in that thread, I think that doing
> what Alan want would imply:
> * added quotes as ASCII
> * dashes and quotes appearing in the document ``, ---, ' as ASCII
> * some brace_no_arg_commands @-commands as ASCII, those that are not
>   in the 7bit ascii range and correspond to punctuation, maybe along
>   @minus, @dots, @enddots, @quotedblleft, @quotedblright,
>   @quoteleft, @quoteright.  Maybe also, but I am not sure,
>   @quotedblbase, @quotesinglbase.
> I think that your change does the first two, but no_extra_unicode does
> not corresponds to the third point.  I am pretty sure that it prevents
> any conversion of @-commands like @l{} to unicode/utf8.

I checked and it didn't.  @l{} did output with the correct character

 ./texi2any.pl -c NO_UTF8_PUNCTUATION=1 ../doc/texinfo.texi

It works by checking %Texinfo::Convert::Unicode::extra_unicode_map
which is the list of exclusions.

