[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: \c escape within $'...' can produce mangled UTF-8

From: Dmitry Groshev
Subject: Re: \c escape within $'...' can produce mangled UTF-8
Date: Sun, 15 Aug 2010 01:08:21 +0400

On 15/08/2010, Chet Ramey <address@hidden> wrote:
> I'm not sure why you think this is a bug.

Because the documentation says "backslash-escaped _characters_", and
not "bytes"? ;-)

> The \c escape is described
> as converting to a control character; control characters are always a
> single byte; the conversion to a control character therefore consumes
> one byte.

This leap of illogic is beyond my ken. As a counterexample, "\x{...}"
escape can consume an unlimited number of bytes while producing a
single byte.

> It's not the business of $'...' conversion to ensure that
> the result is a valid multibyte character string.

Is its business to produce invalid UTF when given a nonsense escape,
then? And is the rest of code quite prepared to deal with invalid
multibyte chars springing into existence at this point?
If an escape's parameter makes no sense, escape sequence should be
left untranslated - just the way "\x" handles things like "\xZZ". Make
"\c" check that its parameter is an ASCII char, and the problem will
be fixed.
Unless for some reason you consider this bug worth preserving. :-)

-= With best regards, Dmitry Groshev =-

reply via email to

[Prev in Thread] Current Thread [Next in Thread]