guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Chris Vine
Subject: Re: guile can't find a chinese named file
Date: Wed, 15 Feb 2017 10:15:33 +0000

On Wed, 15 Feb 2017 10:18:32 +0100
<address@hidden> wrote:
> On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote:
[snip]
> > Filenames and locales are not necessarily related.  When you access
> > a networked file system, you get the filename encoding you are
> > given, which may or may not be the same as the particular locale
> > encoding on your particular machine on one particular day, and may
> > or may not be a unicode encoding.  Glib, for example, enables you
> > to set this with the G_FILENAME_ENCODING environmental variable
> > [...]  
> 
> which is, btw., "just a better approximation", but still wrong: the
> application creating a directory might have been "in" a different
> locale (and thus having a different encoding) that the one creating
> the file whithin that directory.
> 
> Most notably, the whole path might cross several mount points, thus
> the whole path can well have fragments coming from several file
> systems.
> 
> I think the only sane way to see a Linux file system path is the way
> Linux sees it: as a byte string.
> 
> Sure, some helper infrastructure to try to make characters of that
> mess will be welcome, but that should be absolutely robust wrt.
> unexpected input e.g. bad UTF-8) and leave control to the application.
> 
> Not easy.

I don't disagree.  My purpose was to point out that in the modern
world of networking and plug-in devices, locales and filenames are
disjoint.

The glib approach is better than assuming all filenames are in locale
encoding, but it is by no means perfect.  I came across exactly this
problem when writing a small application, mainly for my own use, to
manage music files (actually mainly podcasts) on a USB music stick.
The stick had its filenames in UTF-8 (somewhat confusingly the text in
its index files, which had UTF-8 names, was in UTF-16).  This meant
that if the computer on which the stick was mounted used a different
filename encoding, any file with path could be in a mixed encoding.
Because gio's GFile insists that its filenames with path are in the
encoding set by G_FILENAME_ENCODING, this meant GFile was only
guaranteed to work when the stick was mounted on a computer with
filename encoding set to UTF-8.

In the end I just used the standard POSIX functions to open, close,
read and write files which, because linux is codeset agnostic, worked
fine.  To display filenames in GTK+, I was able to apply
g_filename_to_utf8() to the mount point only and know that the
remainder of the file name was guaranteed to be in UTF-8 already.

> > g_filename_to_utf8() and g_filename_from_utf8() functions for this
> > purpose.  
> 
> To me, that seems insufficient, unless this just applies to one
> (e.g. the last) path element. Skimming the docs I can't see whether
> you are only supposed to do that or whether you can dump whole paths
> (or path fragments) into those functions.

You can do whatever you want with these functions.  They just convert a
text fragment from filename encoding to UTF-8 (if different).  They are
the filename encoding equivalent of g_locale_to_utf8() and
g_locale_from_utf8() for the locale encoding.  If you pass them a
filename with path, and that is in a mixed encoding, it won't work.
There are variants which will gracefully degrade in case of encoding
errors - g_filename_display_name() and g_filename_display_basename().

[snip]
> It's moving between those two views what's hard. Personally, I'd
> tend to have Guile being agnostic (i.e. byte arrays) at the lowest
> level (no conversions), and offer the application what it knows
> (on BSD or "modern" Windows say: "yes, that's UTF-8" and on Linux
> say "No idea, but you can try to convert").
> 
> Current locale is just a weak hint one might use in heuristics.
> For things like environment variables and command line arguments,
> locale is a stronger hint (but not 100%).

I would prefer guile to make the filename encoding a fluid.  It wouldn't
deal with files mounted with mixed encodings, but it would cater for
everything else.

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]