bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: \c escape within $'...' can produce mangled UTF-8


From: Dennis Williamson
Subject: Re: \c escape within $'...' can produce mangled UTF-8
Date: Sun, 15 Aug 2010 03:02:44 -0500

>This leap of illogic is beyond my ken. As a counterexample, "\x{...}"
>escape can consume an unlimited number of bytes while producing a
>single byte.

It only consumes two bytes on my system (or one if it's followed by
another escape or a closing quote).

> Because the documentation says "backslash-escaped _characters_", and
> not "bytes"? ;-)

"Backslash-escaped characters" refers to the "c" in "\c" not the
characters that follow it. The plural refers to the whole set of
escape sequences. The man page also says " \cx    a control-x
character" which implies "x & 0x1F" which implies that "x" is a byte.
Perhaps this could be more explicit in the documentation.

It's the responsibility of your code to put an ASCII character after
the \c. There's no way for Bash to guess that the 0xD0 is part of a
Unicode character or the byte that it is.

On Sat, Aug 14, 2010 at 4:08 PM, Dmitry Groshev
<wjaguar@users.sourceforge.net> wrote:
> On 15/08/2010, Chet Ramey <chet.ramey@case.edu> wrote:
>> I'm not sure why you think this is a bug.
>
> Because the documentation says "backslash-escaped _characters_", and
> not "bytes"? ;-)
>
>> The \c escape is described
>> as converting to a control character; control characters are always a
>> single byte; the conversion to a control character therefore consumes
>> one byte.
>
> This leap of illogic is beyond my ken. As a counterexample, "\x{...}"
> escape can consume an unlimited number of bytes while producing a
> single byte.
>
>> It's not the business of $'...' conversion to ensure that
>> the result is a valid multibyte character string.
>
> Is its business to produce invalid UTF when given a nonsense escape,
> then? And is the rest of code quite prepared to deal with invalid
> multibyte chars springing into existence at this point?
> If an escape's parameter makes no sense, escape sequence should be
> left untranslated - just the way "\x" handles things like "\xZZ". Make
> "\c" check that its parameter is an ASCII char, and the problem will
> be fixed.
> Unless for some reason you consider this bug worth preserving. :-)
>
>
> --
> -= With best regards, Dmitry Groshev =-
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]