bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (gnu)sed in texinfo


From: Mojca Miklavec
Subject: Re: (gnu)sed in texinfo
Date: Fri, 4 Jul 2014 18:27:50 +0200

On Fri, Jul 4, 2014 at 6:08 PM, Mihai Moldovan wrote:
> * On 04.07.2014 05:55 pm, Mojca Miklavec wrote:
>> [...]
>>
>> My other suggestions to fix the problem would be the following:
>>
>> a) either allow compiling texinfo with ./configure
>> --with-sed=/path/to/somesed, so that whenever texinfo is called, it
>> would use that variable to cal "the right sed"; or use some other
>> trick somewhere to achieve the same
>
> That sounds great. autotools could substitute the correct sed binary easily, 
> if
> scripts/binaries are generated from file.in (like, texi2dvi.in.)
>
>
>> b) set LC_CTYPE=C (I forgot the exact details, but the same is done in
>> LuaTeX to avoid problems in different areas); I believe that there is
>> no need to actually detect the encoding, I suspect that anything other
>> than UTF-8 (that is: C or some ISO encoding accepting any 8-bit char)
>> would be fine
>
> That will fail for UTF-8 encoded files, I guess.

Doesn't "C" (I forgot which one of the LC-* variables exactly needs to
be set) mean "process as is"? Also, UTF-8 is just "a special case" of
a generic 8-bit encoding. In principle UTF-8-encoded text could be
treated as / mistaken for ISO Latin 1 (it would give wrong output of
course), so it should work in principle, but see the other recent
emails about environmental encoding on the tex-live mailing list (from
today).

I just checked and in LuaTeX sources the following two are used at
several places:
    setlocale(LC_ALL, "C");
    export LC_ALL=C

Feeding sed with ISO Latin 1 text when using UTF-8 should indeed fail.
(In a way I agree that a tool operating in "UTF-8 mode" should start
screaming when being fed with invalid UTF-8 input.)

But from what I understood sed isn't used in texinfo to do anything
special with "real text" (non-ascii)? (That is: the transformations
probably aren't of a type that would insert newlines, whitespace or
any other characters *in the middle of* UTF-8 characters to make UTF-8
invalid?)

> Hence, I asked for automatic
> input charset detection. As that's not possible, I opt for a.)

Indeed. Input charset detection is a no-go.

>> I totally agree that "there is no way to detect charsets reliably",
>> but I hope that "C" would work. Not using Apple's sed is a bit
>> difficult to set up. In configure scripts it is usually easy to do
>> something like "export SED=gnused" or "export MAKE=gmake" or things
>> like that. But "sed" is hardcoded in texinfo and it's a bit difficult
>> to change it.
>>
>> So I would paraphrase the question: would it be possible to select
>> which SED is used in advance or would it be possible to set some
>> environmental variables that would make any given "sed" happy?
>
> Replacing calls to sed by a ${SED} variable which can be set to arbitrary 
> values
> from the environment sounds fine, too (with defaulting on the general "sed".)
>
> That would, however, still require to set a SED env variable for all ports
> breaking with bsdsed. Maintainers would also need to know about this...

Yes, having to set the SED variable before calling texinfo wouldn't be
comfortable and users would forget that.

Mojca



reply via email to

[Prev in Thread] Current Thread [Next in Thread]