Re: people working in Greg's locale (+euro) & display of Unicode names

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: people working in Greg's locale (+euro) & display of Unicode names

From:	Chet Ramey
Subject:	Re: people working in Greg's locale (+euro) & display of Unicode names
Date:	Thu, 15 Jun 2017 16:11:12 -0400
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.1.1

On 6/15/17 11:22 AM, PePa wrote:
> On 15/06/2560 22:03, Chet Ramey wrote:
>> I don't know other languages well enough to point one out, but I can easily
>> imagine that a particular character is an "alphabetic" in, say, Mandarin,
>> but doesn't exist in someone's en_US character set.
> 
> I though you were referring to a character existing in both sets> This is the 
> reason why I think you should only concern yourself with
> characters that already have an established semantic in bash. Don't get
> bogged down in distinguishing classes in myriads of character sets. Just
> allow anything that isn't ASCII (but IS UTF-8 -- I'm talking about
> UTF-8, otherwise this discussion becomes impossible).

Seriously: not everyone uses a UTF-8 locale. Something that uses an
approach along the lines of Eduardo's patch won't have the UTF-8-only
problem.  If I undertake the effort to put this into bash, and commit to
supporting it forever (which is how these things go), I'm not going to
orphan non-UTF-8 users.

And no matter which way we go here, I can't see any advantage in
allowing invalid multibyte sequences in identifier names.

The proposal to, essentially, use isw* functions instead of is* ctype
functions to determine whether a (now wide) character is a valid
identifier character is a straightforward enhancement. You have to look
at every character anyway no matter what.

> 
>> I see a number of problems with using non-alphanumerics in shell
>> identifiers.  The real advantage to allowing this is to allow users to
>> put alphabetics from their own locales into shell identifiers.  There's
>> little reason to do it otherwise, and plenty of complications.
> 
> What are those problems and complications??

Mostly portability across character sets and maintainability concerns
(which, admittedly, are nobody's problems but mine).

>> As for the implementation, it's much easier to use isalpha/isdigit (and
>> their wide character equivalents) than to try and keep track of a blacklist
>> of characters across different locales.
> 
> I don't propose blacklists across locales, just blacklisting what
> already has an established meaning in bash, ie. ASCII. All the rest is
> just fair game, if someone insists on using a thumbs-up icon in a
> variable name, why restrict that?? The restricting and policing is going
> to make this costly in terms of developer time and CPU time.

You still have to look at every character. The world isn't all UTF-8:
there are character sets where multibyte characters include characters
that are valid ascii (including, I suspect, `=').

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: RFE: Please allow unicode ID chars in identifiers, (continued)
- Re: RFE: Please allow unicode ID chars in identifiers, Chet Ramey, 2017/06/13
  - Re: RFE: Please allow unicode ID chars in identifiers, L A Walsh, 2017/06/13
    - Re: RFE: Please allow unicode ID chars in identifiers, Chet Ramey, 2017/06/13

Prev by Date: RFE: File Descriptor passing and socket pair creation
Next by Date: Re: char-class rules & please show examples of int. locales that use diff. char-class rules
Previous by thread: Re: people working in Greg's locale (+euro) & display of Unicode names
Next by thread: Re: people working in Greg's locale (+euro) & display of Unicode names
Index(es):
- Date
- Thread