Re: RFE: Please allow unicode ID chars in identifiers

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: Please allow unicode ID chars in identifiers

From:	Chet Ramey
Subject:	Re: RFE: Please allow unicode ID chars in identifiers
Date:	Tue, 13 Jun 2017 19:59:22 -0400
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.1.1

On 6/13/17 4:44 PM, tetsujin@scope-eye.net wrote:
> 
> 
> Please excuse the top-posting, this mail client isn't very good...
> 
> To some extent, tying the shell script language to the locale is
> unavoidable. However, one of the points I was trying to make is that, in
> principle, at least, this shouldn't be the case. If a script is written in
> a particular character encoding (and uses characters from that encoding in
> its function names or parameter names, for instance) it should still run
> correctly even if it's run in a different locale, just as a compiled
> program should be able to run in a locale other than the one in which its
> source code was authored.

This isn't a good comparison. Even a compiled program that calls one of
the ctype.h functions is dependent on the locale in which it's run.  A
script, since it's text and interpreted, has the same dependency, to an
even greater extent.  If C source code contains character strings that are
encoded in the author's locale, you're going to get indeterminate results
if you try to display them in an environment using a different locale.

You can mitigate this somewhat by using the mechanisms available to
control the locale: for a C program it's setlocale(), and for a script
it's the LC_ and LANG variables.


> For that to work, basically the character encoding used to interpret the
> script should be (potentially) distinct from the one used to interact with
> the rest of the system.

What "rest of the system"? What "matters of I/O"?

> 
> ...But that gets complicated: the shell would need to interpret the script
> in its locale of origin, but still respect the locale for other matters of
> I/O. But since data in the shell intermingles with programming constructs
> in the shell (Variables get passed by name, command and function names get
> stored in (and invoked from) shell variables, variable values and "here"
> docs come from the script, etc.) it gets into questions like, do we have to
> track character encoding for each variable in the script? When do we
> transcode between encodings? And what happens when a transcoding isn't
> possible?
> 
> So maybe the whole thing is just reaching too far... But that's how I'd
> want to approach it: I'd want people to be able to use their character set
> in their scripts, but I'd want it to work in a way that a script, once
> written, can work regardless of the active locale.

I assume that by this you mean the user's locale. You can still force a
different one.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: RFE: Please allow unicode ID chars in identifiers, (continued)

Prev by Date: Re: RFE: Please allow unicode ID chars in identifiers
Next by Date: Re: RFE: Please allow unicode ID chars in identifiers
Previous by thread: Re: RFE: Please allow unicode ID chars in identifiers
Next by thread: Re: RFE: Please allow unicode ID chars in identifiers
Index(es):
- Date
- Thread