[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Running script from directory with UTF-8 characters

From: Chris Vine
Subject: Re: Running script from directory with UTF-8 characters
Date: Tue, 22 Dec 2015 20:12:40 +0000

On Tue, 22 Dec 2015 17:55:58 +0200
Marko Rauhamaa <address@hidden> wrote:
> Chris Vine <address@hidden> wrote:
> > On Tue, 22 Dec 2015 03:14:18 +0200
> > Marko Rauhamaa <address@hidden> wrote:  
> >> For example,
> >> 
> >>     scheme@(guile-user)> (opendir ".")
> >>     $1 = #<directory stream f7ffa0>
> >>     [...]
> >>     scheme@(guile-user)> (readdir $1)
> >>     $4 = "?9t\x1b["
> >>     scheme@(guile-user)> (open-file $4 "r")
> >>     ERROR: In procedure open-file:
> >>     ERROR: In procedure open-file: No such file or directory:
> >> "?9t\x1b["  
> >
> > You can set the locale in the REPL, if that is where you are working
> > from (as in your example), and then UTF-8 pathnames will work
> > fine.  
> You misunderstood me. The problem is that Guile cannot deal with
> non-UTF-8 pathnames in a UTF-8 locale. IOW, Linux pathnames are *not*
> strings. They are bytevectors. Guile 1.x (as well as Python 2.x) was
> fine bytevector pathnames, but Guile 2.x (as well as Python 3.x) wants
> to pretend filenames are strings. That leads to trouble, potentially
> even to security vulnerabilities.
> A very typical case is a tarball that contains, say, Latin-1
> filenames. If you should expand the tarball in a UTF-8 environment,
> Guile wouldn't be able to deal with the situation.

Yes, you exceeded my powers of deduction (or clairvoyance, depending on
how you look at it).

More to he point, unix-like pathnames are at the implementation level
just a collection of bytes terminated by null and with '/' as the
directory separator. Having said that, the POSIX Portable Filename
Character Set (ยง3.278 of the SUS) doesn't even cover all of ASCII, let
alone unicode.

It can be useful to handle filenames as strings in the program.  My
main objection is not that filenames are not treated as collections of
bytes, but that guile assumes the filename character set is the same as
the locale character set, which on distributed file systems may be
completely false.  I may be wrong, but I do not think you can set the
filename codeset programmatically in guile, which most other libraries

So I guess the best rule is that, even if you don't stick to the
Portable Filename Character Set, stick to ASCII for filenames/paths.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]