Re: bash-4.3: casemod word expansions broken with UTF-8

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash-4.3: casemod word expansions broken with UTF-8

From:	isabella parakiss
Subject:	Re: bash-4.3: casemod word expansions broken with UTF-8
Date:	Tue, 17 Nov 2015 01:28:45 +0100

On 11/15/15, Ulrich Mueller <ulm@gentoo.org> wrote:
> Description:
>       In an UTF-8 locale like en_US.UTF-8, the case-modifying
>       parameter expansions sometimes return invalid UTF-8 encodings.
>
>       This seems to happen when the UTF-8 byte sequences that are
>       encoding upper and lower case have different lengths.
>
> Repeat-By:
>       $ LC_ALL=en_US.UTF-8
>       $ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
>       $ echo -n "${x^}" | od -t x1
>       0000000 49 b1
>       0000002
>
>       This should have output "49" for "I" only. The "b1" is illegal
>       as the first byte of an UTF-8 sequence.
>
>       $ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
>       $ echo -n "${x,}" | od -t x1
>       0000000 c3 9f 9e
>       0000003
>
>       This should have output "c3 9f" (for "sharp s") only.
>

Both examples should work as expected in 4.4-beta.


---
xoxo iza

[Prev in Thread]

Current Thread

[Next in Thread]

bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/16
- Re: bash-4.3: casemod word expansions broken with UTF-8, Chet Ramey, 2015/11/16
  - Re: bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/17
- Re: bash-4.3: casemod word expansions broken with UTF-8, isabella parakiss <=
- Re: bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/16
- bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/16

Prev by Date: Re: bash-4.3: casemod word expansions broken with UTF-8
Next by Date: Re: bash-4.3: casemod word expansions broken with UTF-8
Previous by thread: Re: bash-4.3: casemod word expansions broken with UTF-8
Next by thread: Re: bash-4.3: casemod word expansions broken with UTF-8
Index(es):
- Date
- Thread