Re: UTF-8 issue

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 issue

From:	Tim Waugh
Subject:	Re: UTF-8 issue
Date:	Mon, 6 Dec 2004 16:37:41 +0000
User-agent:	Mutt/1.4.1i

On Mon, Dec 06, 2004 at 09:51:54AM -0500, Chet Ramey wrote:

> Mariano Suárez-Alvarez wrote:
> >Hi,
> >
> >someone just made me note the following behavior with respect to UTF-8
> >handling: on a bash command line,
> >
> >        1) type: read A
> >        2) type a ñ character, that is, a U+00F1 LATIN SMALL LETTER N
> >        WITH TILDE character
> >        3) now backspace it away and hit Enter.
> >        4) now say: echo $A | od -x
> >        5) you should see 
> >        
> >                0000000 0ac3
> >                0000002
> >                
> >        although it should be just 0a. (Note UTF-8 for the ñ
> >        character is 0xC3 0xB1, so I'm getting the remnants of the
> >        deleted ñ) 
> >        
> >
> >I don't know if this is due to bash doing something wrong during the
> >read (maybe it does not set up the line discipline correctly?) or
> >something else. So you are my first try at nailing this ;-)
> 
> I am able to reproduce this using a UTF-8 locale, but I'm not sure it's
> bash's problem.  Since this is a buffered read, bash just calls read(2)
> and returns characters one at a time to the read builtin. read(2)
> returns two characters:  the first byte of the multibyte character, and
> newline.

I haven't been able to reproduce this problem at all:

$ read A
�^H
$ echo $A | od -tx1
0000000 c3 b1 08 0a
0000004

$ read -e A
   <-- here I entered the character and pressed backspace once
[twaugh@gene ~]$ echo $A | od -tx1
0000000 0a
0000001

GNU bash, version 3.00.16(1)-release (i386-redhat-linux-gnu)
$ rpm -q bash
bash-3.0-24
$ echo $LANG
en_GB.UTF-8

Tim.
*/

pgppbtPN_NUYZ.pgp
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

UTF-8 issue, Mariano Suárez-Alvarez, 2004/12/05
- Re: UTF-8 issue, Chet Ramey, 2004/12/06
  - Re: UTF-8 issue, Tim Waugh <=

Prev by Date: Your Xmas Shopping!
Next by Date: Italian Crafted Rolex from $75 to $275 - Free Shipping
Previous by thread: Re: UTF-8 issue
Next by thread: Re: PID of subshell forked with ()
Index(es):
- Date
- Thread