bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Curious case statement error


From: Bob Proulx
Subject: Re: Curious case statement error
Date: Sat, 13 Aug 2016 14:19:51 -0600
User-agent: Mutt/1.5.24 (2015-08-30)

pskocik@gmail.com wrote:
>             [a-z]) echo "Character is in Lowercase";;
>             [A-Z]) echo "Character is in Uppercase";;

What is the output of 'locale' for you?  It will almost certainly show
that your LC_COLLATE is *NOT* set to the C locale but to some other
locale.  Your statements above are correct only in the C locale.  It
depends upon your locale setting and the program's specific handling
of it.  If it is en_US.UTF-8 then the above does not apply.  Instead
it is more likely this:

  [a-z]) echo "Character is in aAbBcC...z range";;
  [A-Z]) echo "Character is in AbBcC...zZ range";;

This is due to the locale collation ordering in your environment.  It
is not specific to bash and also affects grep, sed, awk, sort, and so
forth.  (However newer versions of most programs are specifically
working around this now.  The problem used to be more common a few
years ago but with recent releases the problem is disappearing.)

Using the human language locales en_US.UTF-8 ranges one must use the
[:lower:] and [:upper:] ranges.

  [[:lower:]]) echo "Character is in Lowercase";;
  [[:upper:]]) echo "Character is in Uppercase";;

The grep man page explains this in detail so let me quote it here:

  Within a bracket expression, a range expression consists of two
  characters separated by a hyphen.  It matches any single character
  that sorts between the two characters, inclusive, using the locale's
  collating sequence and character set.  For example, in the default C
  locale, [a-d] is equivalent to [abcd].  Many locales sort characters
  in dictionary order, and in these locales [a-d] is typically not
  equivalent to [abcd]; it might be equivalent to [aBbCcDd], for
  example.  To obtain the traditional interpretation of bracket
  expressions, you can use the C locale by setting the LC_ALL
  environment variable to the value C.

For reasons that most of us common people disagree with the powers
that be decided that locale specific collation sequences would ignore
punctuation and would fold case using "dictionary" collation ordering.

Note also that bash's collation sequence is set when it is started.
In other words changing the LC_ALL or LC_COLLATE variables only
affects newly launched programs.  It will have no effect on the
currently running bash shell.  In other words to change it for bash
you would need something like this:

  $ ...shell with LC_COLLATE set to en_US.UTF-8 with bad collation ...
  $ env LC_COLLATE=C bash
  $ ... works now ...

I mention this because otherwise people try changing the variable and
then don't see a change in the already running bash shell.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]