guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: tomas
Subject: Re: guile can't find a chinese named file
Date: Wed, 15 Feb 2017 10:18:32 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote:
> On Tue, 14 Feb 2017 21:52:01 +0000 (UTC)
> Mike Gran <address@hidden> wrote:
> [snip]
> > > In particular, filenames are *not*, nor can they be mapped to,
> > > Unicode  
> > 
> > > strings in Linux.  
> > 
> > True. Linux should follow OpenBSD and make all locales UTF-8.
> 
> Filenames and locales are not necessarily related.  When you access a
> networked file system, you get the filename encoding you are given,
> which may or may not be the same as the particular locale encoding on
> your particular machine on one particular day, and may or may not be a
> unicode encoding.  Glib, for example, enables you to set this with the
> G_FILENAME_ENCODING environmental variable [...]

which is, btw., "just a better approximation", but still wrong: the
application creating a directory might have been "in" a different
locale (and thus having a different encoding) that the one creating
the file whithin that directory.

Most notably, the whole path might cross several mount points, thus
the whole path can well have fragments coming from several file systems.

I think the only sane way to see a Linux file system path is the way
Linux sees it: as a byte string.

Sure, some helper infrastructure to try to make characters of that
mess will be welcome, but that should be absolutely robust wrt.
unexpected input e.g. bad UTF-8) and leave control to the application.

Not easy.

> g_filename_to_utf8() and g_filename_from_utf8() functions for this
> purpose.

To me, that seems insufficient, unless this just applies to one
(e.g. the last) path element. Skimming the docs I can't see whether
you are only supposed to do that or whether you can dump whole paths
(or path fragments) into those functions.

>          You can tie the filename encoding to the locale encoding by
> defining the G_BROKEN_FILENAMES environmental variable but that is
> deprecated (the name suggests what they thing about that idea).
> 
> You may possibly agree with this: I am not clear from your post what
> connection you were making between locales and filenames.  But if
> OpenBSD requires all _filenames_ to be in valid UTF-8, that is a bad
> decision in my view.

NT has done that too. I don't know: there are arguments for both
approaches -- that depends whether you think file names are composed
of characters (makes sense, no?) or whether the OS doesn't care
what's in them (just leave null and slash alone!).

It's moving between those two views what's hard. Personally, I'd
tend to have Guile being agnostic (i.e. byte arrays) at the lowest
level (no conversions), and offer the application what it knows
(on BSD or "modern" Windows say: "yes, that's UTF-8" and on Linux
say "No idea, but you can try to convert").

Current locale is just a weak hint one might use in heuristics.
For things like environment variables and command line arguments,
locale is a stronger hint (but not 100%).

> Linux is capable of treating filenames as just a null-terminated array
> of bytes with '/' as the directory separator.  It is encoding agnostic,
> and that works just fine.

Or not. For the OS all is fine, for the applications it's a small
hell -- see those Glib functions you quoted, which -- given their
interfaces -- can't possibly do the right thing (dropping their
names in a search engine to skim their documentation turns up
quite a lot of failure modes, if you know what I mean).

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlikHOgACgkQBcgs9XrR2kYBLACggihOlLCNLcUjlrsWh0vQMuH8
JxEAnRye7C4d1GNDJi7x6nLgI1PMamex
=+A5K
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]