gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Increasing the filename space (Or: begging for trouble?


From: Martin Thorsen Ranang
Subject: [Gnu-arch-users] Increasing the filename space (Or: begging for trouble?)
Date: 03 Feb 2004 03:15:43 +0100

Hi.  :-)

After several hours of trying to add the Norwegian letters

    LATIN SMALL LETTER AE:                  'æ'
    LATIN SMALL LETTER O WITH STROKE:       'ø'
    LATIN SMALL LETTER A WITH RING ABOVE:   'å'

and (capitals)

    LATIN CAPITAL LETTER AE:                'Æ'
    LATIN CAPITAL LETTER O WITH STROKE:     'Ø'
    LATIN CAPITAL LETTER A WITH RING ABOVE: 'Å'

to the list of accepted filenames by changing the file =tagging-method
so that the source regexp reads

    source ^[_=a-zA-ZæøåÆØÅ0-9].*$

I have some thoughts I would like to share.

I've been thinking about the purpose of =tagging-methods, and it seems
to me (based on [among other sources] the tutorial and the reference
manual) that the exclude/junk/precious/backup/unrecognized/source
regexps provide a means to explicitly and implicitly define those
categories.

I've also studied some of the code in tla that handles files and it
seems that at least portions of hackerlab seems very Unicode-aware,
while the module/file char/char-class.[hc] is strictly ASCII-based and
explicitly states that is will not consider any locale settings.

Now, my problem seems to be located in the function
contains_illegal_character (char *filename) in the file
tla/libarch/invent.c.  Here I've included my suggested modification of
that function:

static int
contains_illegal_character (char * filename)
{
  int x;

  for (x = 0; filename[x]; ++x)
    if ((filename[x] == '*')
        || (filename[x] == '?')
        || (filename[x] == '[')
        || (filename[x] == ']')
        || (filename[x] == '\\')
        || (filename[x] == ' ')
        || (filename[x] == '\t')
        /* Suggested removal:
           || (!char_is_printable (filename[x]))
        */
        )
      return 1;

  return 0;
}

Now, I can see that the author of that function doesn't want any
"non-printable" characters into the inventory.  But, based on

1) Tom Lord's statement that "the file system is, after all, a form of
   database.  File names are a primary key for that database.
   Limiting the space of reasonably usable keys is lame."

... to which I agree and

2) The filename would (probably) not even have been there in the first
   place if it wasn't because somebody actually needed it or at least
   could see it (i.e. it's printable).

and

3) If my assumption about the intension of the =tagging-method regexps
   is right, then _those_regexps_ should be the controlling variables,
   _not_ a statically compiled and very restrictive set of characters.

I wonder: could you accept the modification suggested above?  I
suppose that if you don't, the thing to do would be to add Unicode or
locale-aware filename handling, but this could certainly help a lot in
the meantime.

Yours sincerely,



Martin Thorsen Ranang




reply via email to

[Prev in Thread] Current Thread [Next in Thread]