bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash-4.3: casemod word expansions broken with UTF-8


From: isabella parakiss
Subject: Re: bash-4.3: casemod word expansions broken with UTF-8
Date: Tue, 17 Nov 2015 01:28:45 +0100

On 11/15/15, Ulrich Mueller <address@hidden> wrote:
> Description:
>       In an UTF-8 locale like en_US.UTF-8, the case-modifying
>       parameter expansions sometimes return invalid UTF-8 encodings.
>
>       This seems to happen when the UTF-8 byte sequences that are
>       encoding upper and lower case have different lengths.
>
> Repeat-By:
>       $ LC_ALL=en_US.UTF-8
>       $ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
>       $ echo -n "${x^}" | od -t x1
>       0000000 49 b1
>       0000002
>
>       This should have output "49" for "I" only. The "b1" is illegal
>       as the first byte of an UTF-8 sequence.
>
>       $ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
>       $ echo -n "${x,}" | od -t x1
>       0000000 c3 9f 9e
>       0000003
>
>       This should have output "c3 9f" (for "sharp s") only.
>

Both examples should work as expected in 4.4-beta.


---
xoxo iza



reply via email to

[Prev in Thread] Current Thread [Next in Thread]