bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pattern replacement fails if string contains multibyte characters


From: Chet Ramey
Subject: Re: Pattern replacement fails if string contains multibyte characters
Date: Fri, 28 Sep 2007 17:02:42 -0400
User-agent: Thunderbird 2.0.0.6 (Macintosh/20070728)

Bernd Eggink wrote:
> This happens on a utf-8 based system (CRUX 2.3), LANG=de_DE.UTF-8:
> 
> t="123abc456äöüABCD"
> echo ${t//[a-c]/}
> # output: 123456öüCD
> # (should be: "123456äöüABCD")
> 
> echo ${t//[!a-c]/}
> # output: abcäAB
> # (should be: "abc")
> 
> bash --version:
> GNU bash, version 3.2.25(1)-release (i686-pc-linux-gnu)
> 
> Without multibyte chars, replacement works as expected. I looks like a
> bug, or am I misssing something?

I get the expected output using Mac OS X or FreeBSD; the same output you
do using FC6.

The difference is in the gnu libc implementation of strcoll(), which bash
uses to compare characters for range matching.  The glibc implementation
ignores the locale; the other systems incorporate the current locale's
collating sequence into their strcoll implementation.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                       Live Strong.  No day but today.
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]