bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Non-upstream patches for bash (2014)


From: George
Subject: Re: Fwd: Non-upstream patches for bash (2014)
Date: Sat, 24 Jun 2017 16:46:47 -0400

On Sat, 2017-06-24 at 12:41 -0500, Eduardo A. Bustamante López wrote:
> I was looking through this old thread:
> http://seclists.org/oss-sec/2014/q3/851
> 
> It looks like the issue reported in there is still there:
> 
>   dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK bash
>   �\
>   dualbus@debian:~$ LANG=en_US.UTF8 printf 'echo \u4e57\n' |LANG=en_US.UTF8 
> bash
>   乗
>   dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK mksh
>   �
>   dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK ksh
>   �\
>   dualbus@debian:~$ LANG=zh_CN.GBK printf 'echo \u4e57\n' |LANG=zh_CN.GBK zsh
>   �
> (In the case that your font doesn't render the glyph for U+4E57, it's:
> http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4e57)
> 
>   dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An
>    81 5c
> 
> It looks like it doesn't detect that \x81\x5c is a single character, and
> instead treats the multibyte character as separate characters.
> 
I'm not seeing the problem here (at least, not in Bash or ksh - mksh and zsh 
seem to have gotten it wrong...)
Bash and ksh (in GBK locale) are outputting $'\u4E57' as a two-byte sequence, 
(0x81, 0x5C), and then you're reading that back into bash and ksh
(respectively) under the same locale, and that same two-byte sequence is being 
retained. If your terminal were in a GBK or GB18030 locale, the
character would be displayed correctly, too.
That said, this seems to not work as well:
$ LANG=zh_CN.GBK printf "echo \$'%s'" $'\u4e57n' |LANG=zh_CN.GBK bash
�
The test checks to see whether the 0x5c byte (as part of a multi-byte 
character) is treated like a backslash in the $'..' quoting syntax - in this
case, it is treated as a backslash, and the '\n' sequence is turned into a 
newline.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]