[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: Please allow unicode ID chars in identifiers

From: L A Walsh
Subject: Re: RFE: Please allow unicode ID chars in identifiers
Date: Tue, 13 Jun 2017 16:58:59 -0700
User-agent: Thunderbird

Chet Ramey wrote:
On 6/2/17 6:23 PM, L A Walsh wrote:

As for unsupported systems, there is a reason they are no longer
supported.  The world is already using UTF-8.  It's only a few
luddites clinging to ascii as a last refuge. ;-)

What display/OS do you have that you can't run UTF-8 on?

This is a red herring. de_DE.UTF-8 and zh_KH.UTF-8 don't use the same
character set.
   The use the same encoding.  Whether or not they use
the same character set is up to someone's preference.  Looking at my
local fonts, it looks like 'Code2000' covers both of those ranges:
Germany and Khmer ?  But using 1 font for both isn't necessary.

   Even MS allows for different character sets
to be used for different parts of unicode.  Linux also has this
with its current font support.  So it's not that critical whether or
not different ranges need different character sets.  If someone
*wants* to display characters in a different language, they can
get fonts that support them.  They aren't that hard to find online

If you want to check if you have coverage for a range online, you can
visit http://www.babelstone.co.uk/Unicode/babelmap.html

If you have a windows machine, you can run the local application from
It will tell you what ranges you have fonts for and what percent
coverage you have.  Example:

With Unicode 8.0 (1 version behind current), I have
203 out of 256 blocks with full coverage,
with 112,637 out of 120,737 characters (93.3%).

(oh poo, just loaded 9.0)...
only 200 out of 267 blocks w/full coverage and
only 112,676 out of 128,172 (87.9%) of the chars.
Just goes to show how you get more behind just sitting still.  ;-)

But if I want to write a script for public consumption, I'd
likely use a common subset of characters.

Forgive me if I'm misremembering, but hasn't Greg argued against
the ability to supply "libraries" of re-usable scripts due to
the ease with which names could conflict with each other and cause
script incompatibilities?

So how much are script libraries widely used across sites now?

If it is the case that script libraries had access to unicode
var & func names (and used it), wouldn't that significantly
decrease the the chances of conflict?  Right now, what, ...
maybe A-Za-z_0-9 + maybe a few others == that's about 64 chars?

If even half the unicode letters were alphabetic / classified as ID
type chars, wouldn't that add over 60K new possibilities?

Even if a character doesn't display in your locality, doesn't
mean it wouldn't work -- i.e. if I don't have a Cryllic font
installed, that doesn't mean the script wouldn't work -- as
the characters would still be encoded as their Unicode values.

Please note -- the subject -- it doesn't say allow any
locality in scripts -- but unicode.  More to the point, I'd
only suggest UTF-8 encoding.  That means all of todays scripts
would run "as is", (as the bottom 127 chars of unicode are
identical to ASCII).  It also solves issues of what encoding
is used.  If it's non-ASCII, it could (or would) default to UTF-8.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]