bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode range and enumeration support.


From: L A Walsh
Subject: Re: Unicode range and enumeration support.
Date: Mon, 23 Dec 2019 21:34:56 -0800
User-agent: Thunderbird

On 2019/12/23 12:58, Greg Wooledge wrote:
On Mon, Dec 23, 2019 at 12:52:00PM -0800, L A Walsh wrote:
   But it wasn't.  It was about generating characters between two
characters that were given.  In unicode, that would be two code points.
Nothing about enumeration.

Please give an example, with a starting character and an ending
character, and the resulting output.  Explain why a bash user who
uses your implementation to echo {ñ..💩} (N WITH TILDE to PILE OF
POO) or whatever will feel that your answer is correct and sensible.
#!/usr/bin/perl
use warnings;use strict; use v5.16;
use utf8;
use constant sch => q(Ⅰ); # start at roman numeral one
use constant ech => q(ⅿ); # end at small roman numeral 1000
use constant scp    => ord(sch);
use constant ecp    => ord(ech);

my %range;
$range{$_} = $_ for scp .. ecp;
my @range = sort {$a cmp $b} keys %range;
my $cnt=1000000;
my $RE=qr{\pN};

for (1 .. $cnt) {
   my $out="";
   for my $v (@range) {
       my $ch=chr($v);
$out .= $ch.q( ) if $ch =~ m{$RE}; #match unicode property "is_num"
   }
   print $out."\n";
}

------------
1 million runs of the central loop takes about 31s.  So 1 run would
be pretty fast.

Unicode has 17 planes of 64K chars each = 1,114,112 chars of which about
10% are currently used.
(https://www.babelstone.co.uk/Unicode/HowMany.html).

I'm not sure what you want me to say about the range you chose,
other than it would be about 128,000 characters. It would be about the same argument, for or against in using
{241..128169}.  I know you are trying to make some point, but
I'm missing it.

It would be helpful if one could use hex in the ranges, like
{0x20..0x110000} to enumerate all code points, leaving out the
control-chars area.
   It is in unicode code point order.  Which is what you would use
for unicode.  If you want to sort via unicode, use the -u switch.

That isn't what the sort -u option does, and you know it.  I hope.
Yeah...though, _I don't remember it_, _if_ I'm not entering it in
a command to use it. I.e just as I'm about to type "|uniq", I
remember that I'll remember that '-u' is a recently added 'shortcut'
to get the same output.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]