emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte and unibyte file names


From: Stephen J. Turnbull
Subject: Re: Multibyte and unibyte file names
Date: Sat, 26 Jan 2013 22:03:28 +0900

Eli Zaretskii writes:

 > > "Unibyte" as implemented in Emacs is a premature optimization, and a
 > > disaster in search of places to happen.  Remove it, and you'll never
 > > notice it's gone.  The consequence of that removal would be to fix
 > > this problem, permanently.
 > 
 > I don't think you are entirely correct.

My preferred flavor of Emacs never had unibyte.  It's got its problems
in this area, but they're just lazy or over-ambitious programmer bugs,
not a design flaw.

 > We still need to send encoded (unibyte) strings to the outside
 > world.

Of course.  In fact, pretty much all interaction with the outside
world involves byte streams.  The problem Emacs is experiencing here
is that Lisp can see bytes when it is designed only to work with
characters.

 > [Determining file name encoding] a non-issue: we treat unibyte file
 > names as encoded in file-name-coding-system.  Nothing else is
 > supported, or needed.

It is in Japan, where it's still common to have a host whose hard
drive uses UTF-8, mounting EUC-JP-encoded volumes over NFS, and USB
drives with Shift-JIS file names.  I've even seen file names
containing segments encoded variously in KOI8, Shift JIS, *and* EUC-JP
(in Macintosh notation, no less).  Admittedly, not in a very long
time, but it's still *possible* to do that on POSIX systems.

You just can't win in this environment; you will see mojibake, and
sometimes undecodable names, unless you get help from the user.  Such
names can be round-tripped using special "undecodable bytes"
representation (UTF-8B or non-unicode code points).  But if you try to
manipulate those names in Lisp, you will sometimes get incorrect
results.

 > Exactly.  Moreover, what you suggest is a large project that won't
 > happen without a motivated individual.  Given the overall "cannot
 > happen on POSIX, so it's SEP"

It can easily happen on POSIX systems, especially with removable media
or double-booting hosts.  The problem is that most people don't care
about Japanese or Chinese, and of those that do, I'm sure most think
that Shift JIS and Big5 are abominations (except for a few Windows
users).

 > reaction I got to this thread, what do you think are the chances of
 > such a project to materialize any time soon?

Not my problem, either.  My preferred flavor of Emacs hasn't had
unibyte-related issues since 1998.

But I don't see why it should be so difficult.  You already have all
the functions needed to decode byte streams to Lisp strings or
buffers, and that's the normal mode of operation, no?  In fact AFAIK
the set of programs that use the unibyte feature at all is pretty
small, and most of those (like Tramp) do so only in self-defense.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]