[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: \c escape within $'...' can produce mangled UTF-8
From: |
Dmitry Groshev |
Subject: |
Re: \c escape within $'...' can produce mangled UTF-8 |
Date: |
Sun, 15 Aug 2010 14:02:05 +0400 |
On 15/08/2010, Dennis Williamson <dennistwilliamson@gmail.com> wrote:
> It only consumes two bytes on my system (or one if it's followed by
> another escape or a closing quote).
You are wrong. Try "echo $'\x{123456}AB'" and look at the result.
Or read the source code: lib/sh/strtans.c
> "Backslash-escaped characters" refers to the "c" in "\c" not the
> characters that follow it.
Given that documentation doesn't say anything like that anywhere, and
given that _every other escape_ operates on characters (accepting only
ASCII chars, and leaving multibyte ones alone) - inventing an
exception specifically for "\c" would look quite contrived.
> It's the responsibility of your code to put an ASCII character after
> the \c.
My code is fine, thank you. ;-) Given that I never had any use for
"\c" when there is "\x".
Instead I found this weirdness in the Bash source code when writing my
own function for interpreting (some of) shell syntax.
> There's no way for Bash to guess that the 0xD0 is part of a
> Unicode character or the byte that it is.
Everything between 0x80 and 0xFF is part of (possibly invalid)
multibyte sequence in UTF-8. Read up on the UTF-8 encoding, and don't
make wrong guesses again.
--
-= With best regards, Dmitry Groshev =-
- \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
- Re: \c escape within $'...' can produce mangled UTF-8, Chet Ramey, 2010/08/14
- Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
- Re: \c escape within $'...' can produce mangled UTF-8, Andre Majorel, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Dennis Williamson, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8,
Dmitry Groshev <=
- Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
- Re: \c escape within $'...' can produce mangled UTF-8, Greg Wooledge, 2010/08/16