Re: Patch for unicode in varnames...

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Patch for unicode in varnames...

From:	L A Walsh
Subject:	Re: Patch for unicode in varnames...
Date:	Tue, 06 Jun 2017 12:02:41 -0700
User-agent:	Thunderbird

Greg Wooledge wrote:

On Tue, Jun 06, 2017 at 07:01:23AM -0700, L A Walsh wrote:
George wrote:
On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote:
George wrote:
On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote:
On 05/06/2560 15:52, George wrote:
there's not a reliable mechanism in place to run a script in alocale whose character encoding doesn't match that of the script
Right. The 1st paragraph (written by you), above, mentions that.Given the 1st paragraph (which no one is contesting), we are only
talking about the case where the run locale and script locale are the same.

Actually, I need to correct myself since BASH and other computer
languages have run in non-native locales since forever.  The compilers

and interpreters don't necessarily pay attention to the current locale.So of course the scripts will run in any locale.

My bad for my uncritical thinking (something that usually doesn't win
me any friends)...


You need someone to explicit say it?  OK, I'll say it.

Scripts that can only *run* in a UTF-8 encoding-locale are a bad idea.
The whole world is not UTF-8, despite what a few people seem to think.

No one said the entire world is UTF-8.  But seems like most of the major
linux vendors default to UTF-8 for console activity.  Bash *is* the linux
shell.  It's being adopted elsewhere, but it seems to have first grown
in use in the linux community.  No one is saying it shouldn't continue
to support ASCII, in fact to remain POSIX compat, it would have to.

That's in addition to all of the issues that arise when trying to *edit*
a script that was written in one or more character set encodings that

are different from yours.

If you look at the locales on a recent linux distro, (locale -a), out of
279 country codes, you'll see
 9 that only have a local-encoding listed.
98 have no encoding listed (guessing they default to POSIX)
172 that have UTF-8
  and
139 that ONLY have UTF-8 listed.

Just from the numbers if you _don't_ support UTF-8 you support
131 locales (47%).  If you do support UTF-8, you can reach
270 locales (97%).

Seems that by supporting UTF-8, you have a good chance of being
understandable in over 95% of the locales in the locale database,
vs. less than 50% if you don't (quick perlscript used to count
that, attached if you want to run it on your own data, or point
out some mistake in my calculations)

Seems like a reasonable tradeoff.

One could almost make a viable case that
"only people on UTF-8 computers are allowed to be developers".  Almost.

---
   I already stated that if you want to create your script for wide
distribution one should use the least-common-denominator, but for your
own systems, shouldn't you have the freedom to write in your native
locale?

But if you also intend to exclude such people from even being able to
*run* your script, I can't take any of this seriously.

I thought it was the case that such scripts would *run* regardless of
how they looked in an editor.

(OK, in reality, I am not taking any of this seriously.  This entire
proposal and discussion are like some bizarre fantasy land to me.  Bash
is a SHELL, for god's sake. Not a serious programming language.

Right, but ksh93 & zsh already support it in varnames, while
ksh93, zsh, bash and mksh all support UTF-8 in function names.

#!/usr/bin/env perl

use warnings; use strict;
use P; use Types::Core;         #Note: in CPAN; load w/ "cpan -i P Types::Core"

# NOTE: expects input on STDIN from "locales -a", as in
#       "locale -a | count"

my (%loc,%encs);

while (<>) {
        length || next;
        chomp;
        length || next;
        my ($l,$e) = m{^([-\w]+)(?:[.@]([-\w]+))?$};
        P "_=%s, (l,e=)(%s,%s)", $_, $l, $e unless $l;
        $loc{$l}={} unless HASH $loc{$l};
        if ($e) {
                $e="UTF-8" if $e eq "utf8";
                $loc{$l}->{$e}++;
        }
};
P "uniq langs  =       %3d", scalar keys %loc;

my ($nochars,$noutf8, $only_utf8, $utf8);

for (keys %loc) {
        my $ep          = $loc{$_};
        my @elst        = keys %$ep;
        my $enum        = scalar @elst;
        if ($enum == 0) { ++$nochars }
        else {
                my ($hvloc, $hvutf8, $e) = (0, 0);
                $_ eq "UTF-8" ? $hvutf8 = 1 : $hvloc = 1 for @elst;
                $hvloc ? $hvutf8 ? ++$utf8 : ++$noutf8 
                                         : $hvutf8 and ++$only_utf8, ++$utf8;
        }
}
P "langs w/no encoding %3d", $nochars;
P "langs w/local only: %3d", $noutf8;
P "langs w/utf8:       %3d", $utf8;
P "langs w/utf8 only:  %3d", $only_utf8;

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Patch for unicode in varnames..., (continued)

Prev by Date: Re: Patch for unicode in varnames...
Next by Date: Re: Patch for unicode in varnames...
Previous by thread: Re: Patch for unicode in varnames...
Next by thread: Re: Patch for unicode in varnames...
Index(es):
- Date
- Thread