guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Marko Rauhamaa
Subject: Re: guile can't find a chinese named file
Date: Thu, 16 Feb 2017 09:16:21 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

Eli Zaretskii <address@hidden>:

> Btw, if by "UCS-2" you meant to say that only characters within the
> BMP are supported in file names on Windows, then this is wrong

No, I'm claiming Windows allows pathnames to contain isolated surrogate
code points, which cannot be decoded back to Unicode with UTF-16.

The situation is completely analogous to Linux pathnames that can
contain illegal UTF-8.

> : since Windows XP, NTFS volumes support file names with characters
> outside of the BMP. I've just successfully created files with such
> file names on Windows XP using Emacs.

Both Windows and Linux filenames support all of Unicode. Trouble is,
both of them support more than Unicode, making it impossible to use
Guile's strings for an arbitrary filename.

Python solves the problem by using a Unicode superset in its strings. I
think that's misguided, and Guile is correct in sticking to Unicode.

If I understood it correctly, someone just told us emacs maps illegal
UTF-8 to another form of illegal UTF-8 and back. That's better in that
it's bytes to bytes (leaving Unicode out), but it's not immediately
obvious to me why you have to transform the byte sequence at all.

Look at the problem of concatenation. We could have a case where two
illegal UTF-8 (or UTF-16) snippets are concatenated to get valid UTF-8
(or UTF-16). That operation fails if you try to translate the snippets
to strings before concatenation. Such concatenation operations are
commonplace when dealing with filenames (eg, split(1)).


Marko



reply via email to

[Prev in Thread] Current Thread [Next in Thread]