[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (gnu)sed in texinfo
From: |
Mojca Miklavec |
Subject: |
Re: (gnu)sed in texinfo |
Date: |
Fri, 4 Jul 2014 18:27:50 +0200 |
On Fri, Jul 4, 2014 at 6:08 PM, Mihai Moldovan wrote:
> * On 04.07.2014 05:55 pm, Mojca Miklavec wrote:
>> [...]
>>
>> My other suggestions to fix the problem would be the following:
>>
>> a) either allow compiling texinfo with ./configure
>> --with-sed=/path/to/somesed, so that whenever texinfo is called, it
>> would use that variable to cal "the right sed"; or use some other
>> trick somewhere to achieve the same
>
> That sounds great. autotools could substitute the correct sed binary easily,
> if
> scripts/binaries are generated from file.in (like, texi2dvi.in.)
>
>
>> b) set LC_CTYPE=C (I forgot the exact details, but the same is done in
>> LuaTeX to avoid problems in different areas); I believe that there is
>> no need to actually detect the encoding, I suspect that anything other
>> than UTF-8 (that is: C or some ISO encoding accepting any 8-bit char)
>> would be fine
>
> That will fail for UTF-8 encoded files, I guess.
Doesn't "C" (I forgot which one of the LC-* variables exactly needs to
be set) mean "process as is"? Also, UTF-8 is just "a special case" of
a generic 8-bit encoding. In principle UTF-8-encoded text could be
treated as / mistaken for ISO Latin 1 (it would give wrong output of
course), so it should work in principle, but see the other recent
emails about environmental encoding on the tex-live mailing list (from
today).
I just checked and in LuaTeX sources the following two are used at
several places:
setlocale(LC_ALL, "C");
export LC_ALL=C
Feeding sed with ISO Latin 1 text when using UTF-8 should indeed fail.
(In a way I agree that a tool operating in "UTF-8 mode" should start
screaming when being fed with invalid UTF-8 input.)
But from what I understood sed isn't used in texinfo to do anything
special with "real text" (non-ascii)? (That is: the transformations
probably aren't of a type that would insert newlines, whitespace or
any other characters *in the middle of* UTF-8 characters to make UTF-8
invalid?)
> Hence, I asked for automatic
> input charset detection. As that's not possible, I opt for a.)
Indeed. Input charset detection is a no-go.
>> I totally agree that "there is no way to detect charsets reliably",
>> but I hope that "C" would work. Not using Apple's sed is a bit
>> difficult to set up. In configure scripts it is usually easy to do
>> something like "export SED=gnused" or "export MAKE=gmake" or things
>> like that. But "sed" is hardcoded in texinfo and it's a bit difficult
>> to change it.
>>
>> So I would paraphrase the question: would it be possible to select
>> which SED is used in advance or would it be possible to set some
>> environmental variables that would make any given "sed" happy?
>
> Replacing calls to sed by a ${SED} variable which can be set to arbitrary
> values
> from the environment sounds fine, too (with defaulting on the general "sed".)
>
> That would, however, still require to set a SED env variable for all ports
> breaking with bsdsed. Maintainers would also need to know about this...
Yes, having to set the SED variable before calling texinfo wouldn't be
comfortable and users would forget that.
Mojca
- Re: (gnu)sed in texinfo, Mihai Moldovan, 2014/07/04
- Re: (gnu)sed in texinfo,
Mojca Miklavec <=
- Re: (gnu)sed in texinfo, Mihai Moldovan, 2014/07/04
- Re: (gnu)sed in texinfo, Karl Berry, 2014/07/04
- Re: (gnu)sed in texinfo, Mihai Moldovan, 2014/07/04
- Re: (gnu)sed in texinfo, Mihai Moldovan, 2014/07/04
- Re: (gnu)sed in texinfo, Karl Berry, 2014/07/07
- Re: (gnu)sed in texinfo, Mihai Moldovan, 2014/07/07
- Re: (gnu)sed in texinfo, Mojca Miklavec, 2014/07/08