guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: tomas
Subject: Re: guile can't find a chinese named file
Date: Wed, 15 Feb 2017 12:48:20 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Feb 15, 2017 at 10:15:33AM +0000, Chris Vine wrote:

[...]

> I don't disagree.  My purpose was to point out that in the modern
> world of networking and plug-in devices, locales and filenames are
> disjoint.
> 
> The glib approach is better than assuming all filenames are in locale
> encoding, but it is by no means perfect.  I came across exactly this
> problem when writing a small application, mainly for my own use, to
> manage music files (actually mainly podcasts) on a USB music stick.
> The stick had its filenames in UTF-8 (somewhat confusingly the text in
> its index files, which had UTF-8 names, was in UTF-16).  This meant
> that if the computer on which the stick was mounted used a different
> filename encoding, any file with path could be in a mixed encoding.
> Because gio's GFile insists that its filenames with path are in the
> encoding set by G_FILENAME_ENCODING, this meant GFile was only
> guaranteed to work when the stick was mounted on a computer with
> filename encoding set to UTF-8.

A very instructive example, thanks :-)

> In the end I just used the standard POSIX functions to open, close,
> read and write files which, because linux is codeset agnostic, worked
> fine.  To display filenames in GTK+, I was able to apply
> g_filename_to_utf8() to the mount point only and know that the
> remainder of the file name was guaranteed to be in UTF-8 already.

[...]

> I would prefer guile to make the filename encoding a fluid.  It wouldn't
> deal with files mounted with mixed encodings, but it would cater for
> everything else.

But why? I think either (a) have an internal encoding which is
"mostly UTF-8", but has space for raw bytes, as David describes
or (b) keeping completely out and dealing in arrays of bytes,
and providing the filename encoding just as an advisory value
("as far as we can know, those file names are encoded FOO")
seems far superior, since it will deal even with mixed encodings.

Of course (b) has a price too. I've seen XML parsers weep because
someone did a "substring" by hand, cutting in half a poor multibyte
sequence. Now it's *your* problem to do string operations right [1].
Loads of fun :-)

All in all, the Emacs way looks most enticing to me.

[1] And that's because file names *are* character strings, after
   all -- that's the point I don't agree with Marko. But this
   is a "soft", "social" thing, not a technical one.

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlikQAQACgkQBcgs9XrR2kZ5aQCdGuJYv4NUSMN3xqavXIi5wH06
TDIAni0035zTUynyBjm5VbLwkbDlAXZ6
=oten
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]