guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Eli Zaretskii
Subject: Re: guile can't find a chinese named file
Date: Thu, 16 Feb 2017 07:54:06 +0200

> Date: Wed, 15 Feb 2017 22:15:52 +0100
> From: address@hidden
> Cc: address@hidden
> 
> > > > A possible solution would be to decode each mount point's part as it
> > > > is being resolved.
> > > 
> > > ...which can only be based on guesswork: there's no reliable info on
> > > the encoding used for that file system (if it's consistent at all).
> > 
> > You could maintain a database of encodings per file system, perhaps
> > user-defined, or derived by some other means.  E.g., for volumes that
> > physically reside on Windows or macOS the encoding is pretty much
> > known in advance.
> 
> This is what I mean by "voodoo".

Such "voodoo" is what Emacs does, more or less (not in this particular
use case, though).  This is what makes it so useful and successful.
Refusing to use such techniques because they are theoretically
imperfect is an obstacle to making useful software systems that
support multi-lingual environments.

> We don't even know the encoding to be consistent whithin one file
> system.

In almost all cases, it is.  Once again, the 99% vs 1% issue.

> An example would be the home dirs of different users running under
> different locales (an extreme example: they may have different 8 bit
> locales!).

Did you ever see such a use case in practice?

Besides, my suggestion works there as well, given a large enough
database that users can augment.

> Anyway, having an encoding à la Emacs eases things a lot, since a
> string can at least survive unharmed a plain round trip.

That's a basic requirement, yes.

> The problem of properly displaying that remains unsolved.

This must be solved sufficiently in the majority of use cases; doing
that is not hard.  For the rest, there should be optional
settings/commands to get the correct display.  Example: the (now
largely unnecessary) rmail-redecode-body command in Rmail.

> Plus operations on that string (concatenation, e.g.).

No, this can be easily coded to support raw bytes.  Emacs does that.

> > No.  At the file system level (for NTFS volumes at least) Windows file
> > names are always UTF-16 encoded, and Windows just "knows" that.
> > Windows converts that to the locale's codepage when you access files
> > via an API that communicates file names encoded in that codepage.  (If
> > the conversion fails, you get question marks instead of the characters
> > that couldn't be converted.)
> 
> I see. That means that Windows has to use surrogates for everything
> beyond the BMP, right?

Yes.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]