[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Something strange with string replacements
From: |
Greg Wooledge |
Subject: |
Re: Something strange with string replacements |
Date: |
Mon, 12 Oct 2015 08:29:53 -0400 |
User-agent: |
Mutt/1.4.2.3i |
On Sun, Oct 11, 2015 at 04:33:11PM -0700, gaspar.bin@gmail.com wrote:
> I was just testing if I could do some things with bash and the I came across
> this:
> $ tigres="Un tigre, dos tigres, tres tigres"
> $ echo ${tigres//[A-Z]/[a-z]}
>
> tt [a-z][a-z][a-z][a-z][a-z], Ale cto kkk log nfs tes tmp tst www
> [a-z][a-z][a-z][a-z][a-z][a-z], aeat home kaka lmms Mail prog temp test
> Clases kernel kfreir Mariah Música system unbind Vídeos webdav
>
> The reply was strange, Ale, cto, kkk, log, nfs, tes... are files in the
> current directory where I'm running this.
The [a-z] on the right hand side is a literal string that your letters
are replaced by. So in the first pass, depending on how your current
locale is defined by your operating system, your input string is replaced
by something like:
[a-z][a-z] [a-z][a-z][a-z][a-z][a-z], [a-z][a-z][a-z]
[a-z][a-z][a-z][a-z][a-z][a-z], [a-z][a-z][a-z][a-z]
[a-z][a-z][a-z][a-z][a-z][a-z]
(You can see that by quoting your parameter expansion properly!)
In the second pass, which occurs since you DIDN'T quote the parameter
expansion, each of these glob-style patterns is replaced by all matching
file names. In your example, [a-z][a-z] is obviously replaced by tt.
And then [a-z][a-z][a-z] is replaced by Ale cto kkk log nfs tes tmp tst www.
And so on.
So, you have several complex things going on here:
1) In your locale, [A-Z] matches more than just uppercase letters.
2) [a-z] is a literal string, not a tr(1)-style replacement group.
3) You didn't quote the parameter expansion, so sequences of [a-z]...
are replaced by filenames in $PWD.
imadev:~/tmp$ touch Ale cto kkk log nfs tes
imadev:~/tmp$ tigres="Un tigre, dos tigres, tres tigres"
imadev:~/tmp$ echo "${tigres//[A-Z]/[a-z]}"
[a-z][a-z] [a-z][a-z][a-z][a-z][a-z], [a-z][a-z][a-z]
[a-z][a-z][a-z][a-z][a-z][a-z], [a-z][a-z][a-z][a-z]
[a-z][a-z][a-z][a-z][a-z][a-z]
imadev:~/tmp$ echo ${tigres//[A-Z]/[a-z]}
[a-z][a-z] [a-z][a-z][a-z][a-z][a-z], cto kkk log nfs tes
[a-z][a-z][a-z][a-z][a-z][a-z], [a-z][a-z][a-z][a-z]
[a-z][a-z][a-z][a-z][a-z][a-z]
imadev:~/tmp$ locale
LANG=en_US.iso88591
LC_CTYPE="en_US.iso88591"
LC_COLLATE="en_US.iso88591"
LC_MONETARY="en_US.iso88591"
LC_NUMERIC="en_US.iso88591"
LC_TIME=POSIX
LC_MESSAGES="en_US.iso88591"
LC_ALL=
In the C locale, you would very likely get different results:
imadev:~/tmp$ (LC_ALL=C; echo "${tigres//[A-Z]/[a-z]}")
[a-z]n tigre, dos tigres, tres tigres
On my system, as you can see, [A-Z] in the en_US.iso88591 locale matches
both upper- and lowercase letters (but possibly not all of them). In
the C locale, [A-Z] matches only A, B, C, D, ..., Z. Your system may
have a different locale definition.
Also note that recent versions of bash has a globasciiranges shopt,
which changes the meaning of [A-Z] and [a-z]. Clearly you did not
have this option enabled, but you may want to play around with it to
see how it changes your code.