bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: documentation bug re character range expressions


From: Marcel (Felix) Giannelia
Subject: Re: documentation bug re character range expressions
Date: Fri, 03 Jun 2011 09:12:07 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100330 Shredder/3.0.4

On 2011-06-03 05:09, Greg Wooledge wrote:
Oh, look, there's more!

[...]

See?  Both tr(1) and ls(1) do it too!

Right; forgot about ls (because "alias ls='LC_COLLATE=C ls'" has been in my .bashrc for so long that I completely forgot it was there :) ), and didn't think to try tr -- but tr appears to be case-sensitive under all locales for me, though.

And yours looks broken -- how does
echo Hello World | tr A-Z a-z
result in a bunch of non-ASCII characters? (That's how it looked when it got here, at any rate -- maybe one of our mail servers did something.)


Even grep, whose man page says it obeys LC_COLLATE and the locale,
actually has [a-c] equivalent to [abc] on all locales. Someone must have
snuck in and fixed it.

You must live in a strange and peculiar world.

imadev:~/qwerty$ type grep
grep is hashed (/usr/bin/grep)
imadev:~/qwerty$ echo 'brown cow' | grep '[A-C]'
brown cow
imadev:~/qwerty$ echo 'BROWN COW' | grep '[a-c]'
BROWN COW

Is every single bit of your knowledge born out of familiarity with just
ONE operating system with weird extensions?

I tried this on a few systems with different distros on them. The only non-Linux systems I have access to are running things so old that they only have the C locale, so I can't be sure about them. (For instance, the only Solaris system I can get at doesn't support unicode at all.)

Here are a couple:

[c69:~]$ uname -sr
Linux 2.6.32-31-generic
[c69:~]$ type grep
grep is /bin/grep
[c69:~]$ locale | grep LC_COLLATE
LC_COLLATE="en_US.UTF-8"
[c69:~]$ echo 'brown cow' | grep '[A-C]'
[c69:~]$ echo 'BROWN COW' | grep '[a-c]'
[c69:~]$


u-elive ~(0)$ uname -sr
Linux 2.6.32-26-generic
u-elive ~(0)$ locale | grep LC_COL
LC_COLLATE="en_US.UTF-8"
u-elive ~(0)$ type grep
grep is /bin/grep
u-elive ~(0)$ echo 'brown cow' | grep '[A-C]'
u-elive ~(0)$ echo 'BROWN COW' | grep '[a-c]'
u-elive ~(0)$


The university used to have a big AIX system, but again, I think it was a version from before the days of unicode locales. Maybe grep working as expected is a Linux thing?

And if so, then you're right about me living in a "strange and peculiar world" -- it's the world of people and organizations too poor to afford proprietary Unices; one would never see HP-UX in a world like that when there are cheaper alternatives. I'll spin up some BSD's in a virtual machine and check those later -- anything else I should try?


You ought to report the bug in your vendor's grep(1) implementation, if
it is actually broken as you describe.


And no, I'm going to keep very quiet about this "bug" in grep -- because it's working the way I want/expect it to, and it'll doubtless break many, many shell scripts and cause data loss for a fair number of people if it were fixed :)

~Felix.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]