[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bash doesn't handle C with acute accent properly during readline's r
From: |
Chet Ramey |
Subject: |
Re: Bash doesn't handle C with acute accent properly during readline's rl_change_case |
Date: |
Fri, 12 May 2017 11:15:34 -0400 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 |
On 5/11/17 8:56 AM, Eduardo Bustamante wrote:
> The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86
>
> - Upper case
> dualbus@debian:~$ printf '\U0106\n'
> Ć
>
> - Lower case
> dualbus@debian:~$ printf '\U0107\n'
> ć
>
> Now, in bash, if you type in ć, then run readline `upcase-word' on it,
> instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4
> 0x86), you end up with 0x07 0x87.
>
> The parameter expansion doesn't seem to have that problem so I think
> it's a bug in readline:
Thanks for the report. This is a bug in readline.
> For some reason, rl_change_case thinks `c` is ASCII:
>
> (gdb) call isascii((unsigned char)c)
> $8 = 1
Because when you cast it to unsigned char, it masks all but the least
significant 8 bits, which results in a valid ascii character.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU chet@case.edu http://cnswww.cns.cwru.edu/~chet/