${#var} reports wrong size on invalid utf8 multibyte characters

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

${#var} reports wrong size on invalid utf8 multibyte characters

From:	hcz
Subject:	${#var} reports wrong size on invalid utf8 multibyte characters
Date:	Sun, 27 Mar 2005 13:16:16 +0200

Configuration Information [Automatically generated, do not change]:
Machine: i386
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-pc-linux-gnu' 
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL 
-DHAVE_CONFIG_H  -I.  -I../bash -I../bash/include -I../bash/lib   -g -O2
uname output: Linux tazzelwurm 2.6.11hcz1 #2 Fri Mar 11 20:01:21 CET 2005 i686 
GNU/Linux
Machine Type: i386-pc-linux-gnu

Bash Version: 3.0
Patch Level: 16
Release Status: release

Description: If a string contains an invalid utf8 sequence, its size
        is reported by ${#var} as the number of characters from start
        up to the character preceding it.

        This way you can construct a string which is handled as
        non-empty by "test -n" and "test -z", but is reported by
        ${#var} as having zero size.

Repeat-By:

        x=$'\xff'foobar

        LC_ALL=C
        echo ${#x}
        # reports: 7

        LC_ALL=en_US.utf-8
        echo ${#x}
        # reports: 0
        [ -n "$x" ] && echo non-empty
        # echoes: non-empty

        x=baz$'\xff'foobar
        LC_ALL=en_US.utf-8
        echo ${#x}
        # reports: 3        

Fix: 
        I understand that - strictly speaking - this is undefined
        behavior, but I'd suggest not stopping to count when an
        invalid multibyte sequence is encountered, but to count it by
        its number of bytes (or by 1), since the string is definitely
        non-empty.

Thanks,

 Heike

[Prev in Thread]

Current Thread

[Next in Thread]

${#var} reports wrong size on invalid utf8 multibyte characters, hcz <=
- Re: ${#var} reports wrong size on invalid utf8 multibyte characters, Chet Ramey, 2005/03/28

Prev by Date: Как выйти в регионы?
Next by Date: Re: Your document
Previous by thread: Как выйти в регионы?
Next by thread: Re: ${#var} reports wrong size on invalid utf8 multibyte characters
Index(es):
- Date
- Thread