[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unicode range and enumeration support.
From: |
L A Walsh |
Subject: |
Unicode range and enumeration support. |
Date: |
Wed, 18 Dec 2019 11:15:46 -0800 |
User-agent: |
Thunderbird |
On 2019/12/16 08:39, Greg Wooledge wrote:
On Sat, Dec 14, 2019 at 02:48:16AM -0800, L A Walsh wrote:
On 2019/12/13 10:42, Greg Wooledge wrote:
There's a larger issue to be addressed first. The man page says,
[...]
sary. When characters are supplied, the expression expands to each
character lexicographically between x and y, inclusive, using the de‐
fault C locale.
----
If it says letters that lends stronger support to including
unicode ranges of letters and numbers since the shell handles unicode and
brace expansions with unicode filenames works just fine. That ranges don't
seems a bit of a wart.
No, it won't include Unicode, because it very clearly says "C locale"
right up there.
----
At one point in time, Bash only supported the C locale for display
and input.
That isn't the case in the current Bash. Just because it wasn't so in the
past, doesn't mean things can't or won't change in the future. If that
was true
we wouldn't have computers.
The problem is, it is *not possible* to extract the set of characters
out of an arbitrary locale. The locale interfaces simply are not built
to allow it.
You can do it in the C locale, simply because the C locale is a known,
fixed quantity that you can hard-code. You can't do it in any other locale.
----
You can do it in Perl, JavaScript, Python, Ruby C, C++ among others,
where range matching support has support for identifying characters of
a specific type out of arbitrary locales. For example (from
https://www.regular-expressions.info/unicode.html):
\p{L} or \p{Letter}: any kind of letter from any language.
\p{Ll} or \p{Lowercase_Letter}: a lowercase letter
that has an uppercase variant.
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter
that has a lowercase variant.
...
\p{Math_Symbol}: any mathematical symbol.
\p{N} or \p{Number}: any kind of numeric character in any script.
\p{Nd} or \p{Decimal_Digit_Number}: a digit zero through nine in any
script except ideographic scripts.
Those can be cross-sectioned with script-name properties from any
script in Unicode (Common, Arabic, Braille, Cherokee, Devangari...Thai,
Tibetan, Ya). The list of support is very extensive. Tables are
published in machine readable form that are used to build support to allow
range matching and enumeration for a huge number of characters.
I.e. you can do it in pretty much any locale supported by Unicode, not
just the C language. I can't begin to list all the references for this,
but just googling on:
"programming language support for ranges of numbers or alphabets in
unicode"
will show a huge number of references.
Such features could be put in [a] loadable module[s], or made "includable"
at build time to manage memory if desired/needed.
OTOH, I already said if one didn't want to do ranges, one could follow
the easier path (I think) and allow any arbitrary unicode range to be
enumerated while ensuring quoting of ASCII-ranged meta characters.
- Not missing, but very hard to see (was Re: Backslash missing in brace expansion), (continued)
- Not missing, but very hard to see (was Re: Backslash missing in brace expansion), L A Walsh, 2019/12/12
- Re: Not missing, but very hard to see (was Re: Backslash missing in brace expansion), Greg Wooledge, 2019/12/12
- Re: Not missing, but very hard to see (was Re: Backslash missing in brace expansion), Ilkka Virta, 2019/12/12
- Re: Not missing, but very hard to see (was Re: Backslash missing in brace expansion), L A Walsh, 2019/12/12
- Re: Not missing, but very hard to see (was Re: Backslash missing in brace expansion), Eli Schwartz, 2019/12/12
- unquoted expansion not working (was Re: Not missing, but very hard to see), L A Walsh, 2019/12/13
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), Greg Wooledge, 2019/12/13
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), L A Walsh, 2019/12/14
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), Eli Schwartz, 2019/12/15
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), Greg Wooledge, 2019/12/16
- Unicode range and enumeration support.,
L A Walsh <=
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/18
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/18
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/18
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/18
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/20
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/22
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/23
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/23
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/24
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/24