bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bash variable names do not comply w/POSIX character set rules


From: Linda Walsh
Subject: bash variable names do not comply w/POSIX character set rules
Date: Sat, 05 Dec 2015 21:43:03 -0800
User-agent: Thunderbird




Under section 2.5.3, Shell Variables, it mentions:

LC_CTYPE
Determine the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters), which characters are defined as letters (character class alpha) and <blank> characters (character class blank), and the behavior of character classes within pattern matching.

If I have an LC_CTYPE set to UTF-8, then the rules in unicode as
to how the character is defined (alpha, numeric, alphanumeric, etc...)
seem appropriate to use.

In the bash man page, there is a definition of 'name':
  name   A word consisting only of  alphanumeric  characters  and  under-
         scores,  and beginning with an alphabetic character or an under-
         score.  Also referred to as an identifier.

However, I was looking for a char to visually separate
a "class" and a var in the class (would have liked something
like a.b, but "." isn't alpha numeric), but
"LATIN CAPITAL LETTER O WITH STROKE" (U+00D8), is alphabetic,
but doesn't work:
 aØb=1
-bash: aØb=1: command not found

The POSIX portable character set:
6. Character Set
6.1 Portable Character Set

Conforming implementations shall support one or more coded character sets. Each supported locale shall include the portable character set, which is the set of symbolic names for characters in Portable Character Set. This is used to describe characters within the text of POSIX.1-2008. The first eight entries in Portable Character Set are defined in the ISO/IEC 6429:1992 standard and the rest of the characters are defined in the ISO/IEC 10646-1:2000 standard.

ISO10646 = Unicode -- I.e. Posix appears to base its definition of
alphanumeric characters, for example, on the Unicode character set.

So, theoretically, any alphanumeric class char from Unicode should work
as described in the bash manpages, to compose a "name" (variable or
subroutine name), but this doesn't seem to be the case.

I know this isn't a trivial POSIX requirement to meet, but given
Gnu and bash's changes in the shell and unix command behavior, it
seems support of the character set would be the foundation of POSIX
compatibility.

It it were me, I'd probably try to look at the perl-handling (imperfect
as it may be) for unicode -- which has had alot of work put into it and
may be one of the more complete and up-to-date implementations for unicode
character handling.  I'd try to see if there was any part that might
either give ideas for bringing bash into compliance or any code that
might provide a pattern for implementation.  But investigating it further
might yield other, better options for bash.  Dunno.

Is this something that's even been thought about or is planned for?

Thanks!
-Linda













reply via email to

[Prev in Thread] Current Thread [Next in Thread]