[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode range and enumeration support.
From: |
L A Walsh |
Subject: |
Re: Unicode range and enumeration support. |
Date: |
Mon, 23 Dec 2019 21:34:56 -0800 |
User-agent: |
Thunderbird |
On 2019/12/23 12:58, Greg Wooledge wrote:
On Mon, Dec 23, 2019 at 12:52:00PM -0800, L A Walsh wrote:
But it wasn't. It was about generating characters between two
characters that were given. In unicode, that would be two code points.
Nothing about enumeration.
Please give an example, with a starting character and an ending
character, and the resulting output. Explain why a bash user who
uses your implementation to echo {ñ..💩} (N WITH TILDE to PILE OF
POO) or whatever will feel that your answer is correct and sensible.
#!/usr/bin/perl
use warnings;use strict; use v5.16;
use utf8;
use constant sch => q(Ⅰ); # start at roman numeral one
use constant ech => q(ⅿ); # end at small roman numeral 1000
use constant scp => ord(sch);
use constant ecp => ord(ech);
my %range;
$range{$_} = $_ for scp .. ecp;
my @range = sort {$a cmp $b} keys %range;
my $cnt=1000000;
my $RE=qr{\pN};
for (1 .. $cnt) {
my $out="";
for my $v (@range) {
my $ch=chr($v);
$out .= $ch.q( ) if $ch =~ m{$RE}; #match unicode property
"is_num"
}
print $out."\n";
}
------------
1 million runs of the central loop takes about 31s. So 1 run would
be pretty fast.
Unicode has 17 planes of 64K chars each = 1,114,112 chars of which about
10% are currently used.
(https://www.babelstone.co.uk/Unicode/HowMany.html).
I'm not sure what you want me to say about the range you chose,
other than it would be about 128,000 characters.
It would be about the same argument, for or against in using
{241..128169}. I know you are trying to make some point, but
I'm missing it.
It would be helpful if one could use hex in the ranges, like
{0x20..0x110000} to enumerate all code points, leaving out the
control-chars area.
It is in unicode code point order. Which is what you would use
for unicode. If you want to sort via unicode, use the -u switch.
That isn't what the sort -u option does, and you know it. I hope.
Yeah...though, _I don't remember it_, _if_ I'm not entering it in
a command to use it. I.e just as I'm about to type "|uniq", I
remember that I'll remember that '-u' is a recently added 'shortcut'
to get the same output.
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), (continued)
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), Greg Wooledge, 2019/12/16
- Unicode range and enumeration support., L A Walsh, 2019/12/18
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/18
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/18
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/18
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/18
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/20
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/22
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/23
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/23
- Re: Unicode range and enumeration support.,
L A Walsh <=
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/24
- Re: Unicode range and enumeration support., Robert Elz, 2019/12/24
- Re: Unicode range and enumeration support., Eli Schwartz, 2019/12/24
- Re: Unicode range and enumeration support., Stephane Chazelas, 2019/12/25
- Re: Unicode range and enumeration support., Robert Elz, 2019/12/24
- Re: Unicode range and enumeration support., Greg Wooledge, 2019/12/23
- Re: Unicode range and enumeration support., L A Walsh, 2019/12/23
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), Robert Elz, 2019/12/14
- Re: unquoted expansion not working (was Re: Not missing, but very hard to see), L A Walsh, 2019/12/15
- Re: Not missing, but very hard to see (was Re: Backslash missing in brace expansion), Chet Ramey, 2019/12/13