Re: \c escape within $'...' can produce mangled UTF-8

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: \c escape within $'...' can produce mangled UTF-8

From:	Chet Ramey
Subject:	Re: \c escape within $'...' can produce mangled UTF-8
Date:	Sat, 14 Aug 2010 16:19:57 -0400
User-agent:	Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6

On 8/14/10 3:01 PM, Dmitry Groshev wrote:
> Configuration Information [Automatically generated, do not change]:
> Machine: i686
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu'
> -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/local/share/locale'
> -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include
> -I./lib   -g -O2
> uname output: Linux wjlair 2.6.24.5-smp #1 SMP Fri Aug 14 19:13:09 MSD
> 2009 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ AuthenticAMD
> GNU/Linux
> Machine Type: i686-pc-linux-gnu
> 
> Bash Version: 4.1
> Patch Level: 0
> Release Status: release
> 
> Description:
>         In UTF-8 locale (such as ru_RU.UTF-8), a \c escape within
> $'...' results in an invalid UTF-8 string if followed by an UTF-8
> character: ansicstr() in lib/sh/strtrans.c consumes and converts the
> character's first byte, leaving the rest of UTF-8 sequence as it were.

I'm not sure why you think this is a bug.  The \c escape is described
as converting to a control character; control characters are always a
single byte; the conversion to a control character therefore consumes
one byte.  It's not the business of $'...' conversion to ensure that
the result is a valid multibyte character string.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

[Prev in Thread]

Current Thread

[Next in Thread]

\c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
- Re: \c escape within $'...' can produce mangled UTF-8, Chet Ramey <=
  - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/14
    - Re: \c escape within $'...' can produce mangled UTF-8, Andre Majorel, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dennis Williamson, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Andreas Schwab, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Dmitry Groshev, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Mike Frysinger, 2010/08/15
    - Re: \c escape within $'...' can produce mangled UTF-8, Greg Wooledge, 2010/08/16

Prev by Date: \c escape within $'...' can produce mangled UTF-8
Next by Date: Re: \c escape within $'...' can produce mangled UTF-8
Previous by thread: \c escape within $'...' can produce mangled UTF-8
Next by thread: Re: \c escape within $'...' can produce mangled UTF-8
Index(es):
- Date
- Thread