[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 issue
From: |
Tim Waugh |
Subject: |
Re: UTF-8 issue |
Date: |
Mon, 6 Dec 2004 16:37:41 +0000 |
User-agent: |
Mutt/1.4.1i |
On Mon, Dec 06, 2004 at 09:51:54AM -0500, Chet Ramey wrote:
> Mariano Suárez-Alvarez wrote:
> >Hi,
> >
> >someone just made me note the following behavior with respect to UTF-8
> >handling: on a bash command line,
> >
> > 1) type: read A
> > 2) type a ñ character, that is, a U+00F1 LATIN SMALL LETTER N
> > WITH TILDE character
> > 3) now backspace it away and hit Enter.
> > 4) now say: echo $A | od -x
> > 5) you should see
> >
> > 0000000 0ac3
> > 0000002
> >
> > although it should be just 0a. (Note UTF-8 for the ñ
> > character is 0xC3 0xB1, so I'm getting the remnants of the
> > deleted ñ)
> >
> >
> >I don't know if this is due to bash doing something wrong during the
> >read (maybe it does not set up the line discipline correctly?) or
> >something else. So you are my first try at nailing this ;-)
>
> I am able to reproduce this using a UTF-8 locale, but I'm not sure it's
> bash's problem. Since this is a buffered read, bash just calls read(2)
> and returns characters one at a time to the read builtin. read(2)
> returns two characters: the first byte of the multibyte character, and
> newline.
I haven't been able to reproduce this problem at all:
$ read A
�^H
$ echo $A | od -tx1
0000000 c3 b1 08 0a
0000004
$ read -e A
<-- here I entered the character and pressed backspace once
[twaugh@gene ~]$ echo $A | od -tx1
0000000 0a
0000001
GNU bash, version 3.00.16(1)-release (i386-redhat-linux-gnu)
$ rpm -q bash
bash-3.0-24
$ echo $LANG
en_GB.UTF-8
Tim.
*/
pgppbtPN_NUYZ.pgp
Description: PGP signature