[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: var_is_valid_name

From: John Darrington
Subject: Re: var_is_valid_name
Date: Fri, 27 Mar 2009 10:12:26 +0900
User-agent: Mutt/1.5.13 (2006-08-11)

On Thu, Mar 26, 2009 at 04:58:13PM -0700, Ben Pfaff wrote:
     I think that getting rid of var_is_valid_name() would do more
     than what we want.  In particular, I think that a user of the GUI
     would then be able to create a variable that could not be used in
     syntax, which means that it could not be used in GUI procedures
     that internally use syntax.

We already have that problem :(

For example, if you load up the file at  you'll see that
it has variable names with non-ascii characters.  Thus the gui
generates syntax which the lexer thinks it invalid.  I think this is a
limitation of the lexer which needs to be addressed, but that's the
subject of another thread ....

     Variable names that would be
     troublesome include those that start with a digit (or consist
     only of digits), or contain special characters such as spaces or
     double quotes.

The problem with those criteria as you've described them is that they
depend upon the encoding.  For example the byte which corresponds to a
space in ascii 0x20 might well be an ordinary character in some other
encoding. Similarly a byte which is a digit in one character set,
could be alphanumeric in another.  This won't happen in any iso
encoding or in utf8, but I just don't know about the general case.
     The real problem here is that we are disallowing an unreasonable
     number of characters in identifiers, right?  To fix that, we can
     adjust lex_is_id1 and lex_is_id2.  Perhaps all we need to do is
     to add "|| c >= 128" to the test in lex_is_id1().  Although that
     assumes that we are using a sane character encoding such as UTF-8
     or ISO Latin-#, we seem to be moving internally toward UTF-8 for
     everything anyhow.

In fact, we're only using utf8 internally in the GUI.   Thinking about
recent problems reported by international users, has lead me to
conclude that we have to delay  conversion to utf8 until quite a high
level, if we have any chance of being compatible with spss' data
files, and friendly to international users.

Anyway, so far as the current problem is concerned, var_is_valid_name
needs to be much less fussy.  I'll try your proposed solution since I
think it may work, as you say, for "sane" character sets.  If it
doesn't work, or if we find a system file with an insane encoding then
we'll have to look at this again.


PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]