bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[3.0] UTF-8 and ${#var} or ${var: -1}


From: Stephane Chazelas
Subject: [3.0] UTF-8 and ${#var} or ${var: -1}
Date: Thu, 29 Jul 2004 16:44:41 +0100
User-agent: Mutt/1.5.6i

At
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06_02

   ${#parameter}
           String Length. The length in characters of the value
                                        ~~~~~~~~~~
           of parameter shall be substituted.

If the parameter contains a two byte utf8 character
${#parameter} returns 2:

bash-3.00$ uname -rsvm
SunOS 5.8 Generic_117000-05 sun4u
bash-3.00$ locale charmap
UTF-8
bash-3.00$ a=$(printf '%b' '\0303\0251')
bash-3.00$ [[ $a = ? ]] && echo yes
yes
bash-3.00$ echo ${#a}
2

There's also a problem with ${var: -<n>}:

bash-3.00$ a=AeB
bash-3.00$ printf %s "${a: -1}" | od -to1
0000000 102
0000001
bash-3.00$ a=$(printf '%b' 'A\0303\0251B')
bash-3.00$ printf %s "${a: -1}" | od -to1
0000000


It seems OK in other places:

bash-3.00$ printf %s "${a#?}" | od -to1
0000000
bash-3.00$ case $a in ?) echo yes;; esac
yes
bash-3.00$ printf %4s "$a" | od -to1
0000000 040 040 040 303 251
0000005
bash-3.00$ a=$(printf '%b' 'A\0303\0251B')
bash-3.00$ printf %s "${a:2}" | od -to1
0000000 102
0000001

regards,
Stephane

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________




reply via email to

[Prev in Thread] Current Thread [Next in Thread]