grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Locale aware range expressions?


From: Ronan Pigott
Subject: Locale aware range expressions?
Date: Sun, 28 Jan 2024 02:43:30 +0000

Hi grep,

The grep manual, in the section titled "Character Classes and Bracket
Expressions" is careful to point out the effect of the user's locale and
collation order on the meaning of range expressions. In particular, it
highlights that [a-d] is equivalent to [abcd] in the C locale, but may be
equivalent to [aAbBcCdD] in the user's locale because:

  "It matches any single character that sorts between the two characters,
  inclusive, using the locale's collating sequence and character set."

However, in my experience this is not true.

  $ grep ^NAME /etc/os-release; pacman -Q grep
  NAME="Arch Linux"
  grep 3.11-1
  
  $ locale | grep -E '^(LANG|LC_COLLATE|LC_ALL)'
  LANG=en_US.UTF-8
  LC_COLLATE="en_US.UTF-8"
  LC_ALL=
  
  # locale aware collation, exactly as described in grep(1)
  $ print -l {a..d} {A..D} | sort
  a
  A
  b
  B
  c
  C
  d
  D
  
  # only lowercase matches, despite A/B/C all sorting within the range
  $ print -l {a..d} {A..D} | grep '[a-d]'
  a
  b
  c
  d

This contradicts the grep manual afaict. Is this a bug in grep or the
documentation? Is it user error?

Thanks,

Ronan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]