Re: bash-4.3: casemod word expansions broken with UTF-8

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash-4.3: casemod word expansions broken with UTF-8

From:	Ulrich Mueller
Subject:	Re: bash-4.3: casemod word expansions broken with UTF-8
Date:	Sun, 15 Nov 2015 17:56:59 +0100

>>>>> On Sun, 15 Nov 2015, Ulrich Mueller wrote:

> Description:
>       In an UTF-8 locale like en_US.UTF-8, the case-modifying
>       parameter expansions sometimes return invalid UTF-8 encodings.

>       This seems to happen when the UTF-8 byte sequences that are
>       encoding upper and lower case have different lengths.

Even more interesting effects happen if the string contains a
character whose UTF-8 encoding gets *longer* after case conversion,
because then the terminating null byte will be overwritten.

For example, U+0250 "LATIN SMALL LETTER TURNED A" is represented by a
two byte sequence in UTF-8, while its uppercase equivalent U+2C6F
needs three bytes:

        $ LC_ALL=en_US.UTF-8
        $ x=$'aaaaa\xc9\x90'
        $ y=${x^^}
        $ echo -n "$y" | od -t x1
        0000000 41 41 41 41 41 e2 90 af 6f 6d 65 2f 75 6c 6d
        0000017

y contains some trailing garbage (could be a part of $HOME or $PWD).

[Prev in Thread]

Current Thread

[Next in Thread]

bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/16
- Re: bash-4.3: casemod word expansions broken with UTF-8, Chet Ramey, 2015/11/16
  - Re: bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/17
- Re: bash-4.3: casemod word expansions broken with UTF-8, isabella parakiss, 2015/11/16
- Re: bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller <=
- bash-4.3: casemod word expansions broken with UTF-8, Ulrich Mueller, 2015/11/16

Prev by Date: Re: bash-4.3: casemod word expansions broken with UTF-8
Next by Date: "fc" builtin exits with unexpected status
Previous by thread: Re: bash-4.3: casemod word expansions broken with UTF-8
Next by thread: bash-4.3: casemod word expansions broken with UTF-8
Index(es):
- Date
- Thread