guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: David Kastrup
Subject: Re: guile can't find a chinese named file
Date: Wed, 15 Feb 2017 10:54:06 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

<address@hidden> writes:

> On Tue, Feb 14, 2017 at 10:19:14PM +0000, Chris Vine wrote:
>> On Tue, 14 Feb 2017 21:52:01 +0000 (UTC)
>> Mike Gran <address@hidden> wrote:
>> [snip]
>> > > In particular, filenames are *not*, nor can they be mapped to,
>> > > Unicode  
>> > 
>> > > strings in Linux.  
>> > 
>> > True. Linux should follow OpenBSD and make all locales UTF-8.
>> 
>> Filenames and locales are not necessarily related.  When you access a
>> networked file system, you get the filename encoding you are given,
>> which may or may not be the same as the particular locale encoding on
>> your particular machine on one particular day, and may or may not be a
>> unicode encoding.  Glib, for example, enables you to set this with the
>> G_FILENAME_ENCODING environmental variable [...]
>
> which is, btw., "just a better approximation", but still wrong: the
> application creating a directory might have been "in" a different
> locale (and thus having a different encoding) that the one creating
> the file whithin that directory.
>
> Most notably, the whole path might cross several mount points, thus
> the whole path can well have fragments coming from several file systems.
>
> I think the only sane way to see a Linux file system path is the way
> Linux sees it: as a byte string.
>
> Sure, some helper infrastructure to try to make characters of that
> mess will be welcome, but that should be absolutely robust wrt.
> unexpected input e.g. bad UTF-8) and leave control to the application.
>
> Not easy.

If you tell Emacs that some external entity is in UTF-8, it will
represent all valid UTF-8 sequences as properly decoded characters, and
it has special codes for all bytes not part of valid UTF-8.

As a result, it works with valid UTF-8 perfectly as expected but will
reproduce arbitrary byte streams thrown at it perfectly when decoding as
UTF-8 and then reencoding into UTF-8 again.

Guile is lacking this byte stream reproducibility when
decoding/reencoding.  That makes it a whole lot less robust for dealing
with externally provided material.

-- 
David Kastrup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]