|
From: | Chet Ramey |
Subject: | Re: UTF-8 issue |
Date: | Mon, 06 Dec 2004 09:51:54 -0500 |
User-agent: | Mozilla Thunderbird 0.9 (Macintosh/20041103) |
Mariano Suárez-Alvarez wrote:
Hi, someone just made me note the following behavior with respect to UTF-8 handling: on a bash command line, 1) type: read A 2) type a ñ character, that is, a U+00F1 LATIN SMALL LETTER N WITH TILDE character 3) now backspace it away and hit Enter. 4) now say: echo $A | od -x5) you should see 0000000 0ac30000002although it should be just 0a. (Note UTF-8 for the ñcharacter is 0xC3 0xB1, so I'm getting the remnants of thedeleted ñ)I don't know if this is due to bash doing something wrong during the read (maybe it does not set up the line discipline correctly?) or something else. So you are my first try at nailing this ;-)
I am able to reproduce this using a UTF-8 locale, but I'm not sure it's bash's problem. Since this is a buffered read, bash just calls read(2) and returns characters one at a time to the read builtin. read(2) returns two characters: the first byte of the multibyte character, and newline. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ( ``Discere est Dolere'' -- chet ) Live...Laugh...LoveChet Ramey, ITS, CWRU chet@po.cwru.edu http://tiswww.tis.cwru.edu/~chet/
[Prev in Thread] | Current Thread | [Next in Thread] |