[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bash-4.3: casemod word expansions broken with UTF-8
From: |
isabella parakiss |
Subject: |
Re: bash-4.3: casemod word expansions broken with UTF-8 |
Date: |
Tue, 17 Nov 2015 01:28:45 +0100 |
On 11/15/15, Ulrich Mueller <ulm@gentoo.org> wrote:
> Description:
> In an UTF-8 locale like en_US.UTF-8, the case-modifying
> parameter expansions sometimes return invalid UTF-8 encodings.
>
> This seems to happen when the UTF-8 byte sequences that are
> encoding upper and lower case have different lengths.
>
> Repeat-By:
> $ LC_ALL=en_US.UTF-8
> $ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
> $ echo -n "${x^}" | od -t x1
> 0000000 49 b1
> 0000002
>
> This should have output "49" for "I" only. The "b1" is illegal
> as the first byte of an UTF-8 sequence.
>
> $ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
> $ echo -n "${x,}" | od -t x1
> 0000000 c3 9f 9e
> 0000003
>
> This should have output "c3 9f" (for "sharp s") only.
>
Both examples should work as expected in 4.4-beta.
---
xoxo iza