Re: \c escape within $'...' can produce mangled UTF-8

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: \c escape within $'...' can produce mangled UTF-8

From:	Dmitry Groshev
Subject:	Re: \c escape within $'...' can produce mangled UTF-8
Date:	Sun, 15 Aug 2010 14:02:05 +0400

On 15/08/2010, Dennis Williamson <dennistwilliamson@gmail.com> wrote:
> It only consumes two bytes on my system (or one if it's followed by
> another escape or a closing quote).

You are wrong. Try "echo $'\x{123456}AB'" and look at the result.
Or read the source code: lib/sh/strtans.c

> "Backslash-escaped characters" refers to the "c" in "\c" not the
> characters that follow it.

Given that documentation doesn't say anything like that anywhere, and
given that _every other escape_ operates on characters (accepting only
ASCII chars, and leaving multibyte ones alone) - inventing an
exception specifically for "\c" would look quite contrived.

> It's the responsibility of your code to put an ASCII character after
> the \c.

My code is fine, thank you. ;-) Given that I never had any use for
"\c" when there is "\x".
Instead I found this weirdness in the Bash source code when writing my
own function for interpreting (some of) shell syntax.

> There's no way for Bash to guess that the 0xD0 is part of a
> Unicode character or the byte that it is.

Everything between 0x80 and 0xFF is part of (possibly invalid)
multibyte sequence in UTF-8. Read up on the UTF-8 encoding, and don't
make wrong guesses again.

-- 
-= With best regards, Dmitry Groshev =-

[Prev in Thread]

Current Thread

[Next in Thread]

\c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
- Re: \c escape within $'...' can produce mangled UTF-8, Chet Ramey, 2010/08/14
  - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
    - Re: \c escape within $'...' can produce mangled UTF-8, Andre Majorel, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dennis Williamson, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev <=
    - Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Greg Wooledge, 2010/08/16

Prev by Date: Re: \c escape within $'...' can produce mangled UTF-8
Next by Date: Re: \c escape within $'...' can produce mangled UTF-8
Previous by thread: Re: \c escape within $'...' can produce mangled UTF-8
Next by thread: Re: \c escape within $'...' can produce mangled UTF-8
Index(es):
- Date
- Thread