bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: Please allow unicode ID chars in identifiers


From: tetsujin
Subject: Re: RFE: Please allow unicode ID chars in identifiers
Date: Tue, 13 Jun 2017 16:44:08 -0400


Please excuse the top-posting, this mail client isn't very good...

To some extent, tying the shell script language to the locale is
unavoidable. However, one of the points I was trying to make is that,
in principle, at least, this shouldn't be the case. If a script is
written in a particular character encoding (and uses characters from
that encoding in its function names or parameter names, for instance)
it should still run correctly even if it's run in a different locale,
just as a compiled program should be able to run in a locale other
than the one in which its source code was authored.

For that to work, basically the character encoding used to interpret
the script should be (potentially) distinct from the one used to
interact with the rest of the system.

...But that gets complicated: the shell would need to interpret the
script in its locale of origin, but still respect the locale for other
matters of I/O. But since data in the shell intermingles with
programming constructs in the shell (Variables get passed by name,
command and function names get stored in (and invoked from) shell
variables, variable values and "here" docs come from the script, etc.)
it gets into questions like, do we have to track character encoding
for each variable in the script? When do we transcode between
encodings? And what happens when a transcoding isn't possible?

So maybe the whole thing is just reaching too far... But that's how
I'd want to approach it: I'd want people to be able to use their
character set in their scripts, but I'd want it to work in a way that
a script, once written, can work regardless of the active locale.

----- Original Message -----
From: chet.ramey@case.edu
To:<tetsujin@scope-eye.net>, "dualbus" <dualbus@gmail.com>, "L A
Walsh" <bash@tlinx.org>
Cc:<chet.ramey@case.edu>, "bug-bash" <bug-bash@gnu.org>
Sent:Tue, 13 Jun 2017 15:04:24 -0400
Subject:Re: RFE: Please allow unicode ID chars in identifiers

 On 6/2/17 12:54 PM, tetsujin@scope-eye.net wrote:

 > - As you pointed out, this requires the shell to somehow establish
a
 > convention governing the character set used to interpret shell
scripts

 It's actually the same one that is currently used: the current
locale.

 > 
 > But, on the other hand:
 > - Even if your editor or terminal can't display the UTF-8 code,
that
 > doesn't mean the shell process can't RUN it.

 As long as the locale is set appropriately.

> 2: For a script, the character encoding of commands must be
explicitly
 > specified, probably via a shell option. 

 You can already do this by setting the various locale environment
 variables.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]