guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Eli Zaretskii
Subject: Re: guile can't find a chinese named file
Date: Wed, 15 Feb 2017 22:32:57 +0200

> Date: Wed, 15 Feb 2017 21:20:56 +0100
> From: address@hidden
> Cc: address@hidden
> 
> > > Most notably, the whole path might cross several mount points, thus
> > > the whole path can well have fragments coming from several file systems.
> > 
> > A possible solution would be to decode each mount point's part as it
> > is being resolved.
> 
> ...which can only be based on guesswork: there's no reliable info on
> the encoding used for that file system (if it's consistent at all).

You could maintain a database of encodings per file system, perhaps
user-defined, or derived by some other means.  E.g., for volumes that
physically reside on Windows or macOS the encoding is pretty much
known in advance.

> > > I think the only sane way to see a Linux file system path is the way
> > > Linux sees it: as a byte string.
> > 
> > This would lose a lot in 99% of use cases.  You are, in effect,
> > suggesting a "reverse optimization", whereby the majority of use cases
> > is punished in favor of a small minority, based on theoretical
> > intractability.
> 
> I feel queasy doing some voodoo whithout the application having
> a word on it. In the Emacs context it's a bit easier, because in
> the "normal" case things are pretty quickly deferred to the user
> (usually).

Not really, there are a lot of internal operations that access files
and directories, and would wreak major havoc if they don't succeed,
silently, in the absolute majority of uses.

> > > NT has done that too.
> > 
> > Windows can do that because it also transparently translates file
> > names to the locale's encoding when files are accessed with ANSI APIs.
> > Without such translation, this kind of decision is unwise, IMO.
> 
> I guess (I don't *know*) Windows stores information about the encoding
> at file system level (and keeps that consistent).

No.  At the file system level (for NTFS volumes at least) Windows file
names are always UTF-16 encoded, and Windows just "knows" that.
Windows converts that to the locale's codepage when you access files
via an API that communicates file names encoded in that codepage.  (If
the conversion fails, you get question marks instead of the characters
that couldn't be converted.)

> Linux hasn't that, it just keeps out of it. It hasn't even a place
> to state the encoding used.

Exactly.  Which is why forcing a single file-name encoding on
Linux/Unix filesystems is IMO a bad idea.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]