emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte and unibyte file names


From: Eli Zaretskii
Subject: Re: Multibyte and unibyte file names
Date: Sat, 26 Jan 2013 12:54:20 +0200

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden,  address@hidden,  address@hidden
> Date: Fri, 25 Jan 2013 17:28:40 -0500
> 
> > What I meant was to return decoded file names from all file-name
> > primitives, such as file-name-nondirectory, even if their input was
> > encoded.
> 
> It's probably OK to do that, but I wonder why we'd need to do it

It's not a goal in itself, it's a side effect: if every primitive
decodes any encoded file name on entry, it will thereafter manipulate
decoded strings throughout its execution, and will therefore return a
decoded string.  (We could, of course, encode it back if we found the
argument encoded, but then it isn't exactly clear what to do when some
arguments are encoded, the others aren't; and if some of them are
pure-ASCII, they are not easily distinguished from encoded file names.)

> under what circumstances could such a primitive receive an encoded
> file-name, if all the file names returned to Elisp (by things like
> directory-files) are already decoded?

One way is that a primitive gets called from C.  I gave one example of
this in my original message.  There aren't many of such examples, but
if we _want_ to support encoded file names, the code needs to DTRT
with them, even if this happens only once in a blue moon.

> > The issue is in the file-name primitives that want to support both
> > encoded and decoded file names, and as I understand from this
> > discussion, this feature should stay.
> 
> Of course, we shouldn't just reject encoded filenames, but I don't see
> why we should worry too much about them.

I "worry" because they need separate code, especially with multibyte
encodings; writing that code for an encoding not supported by the
current locale is tricky at best, if not downright impossible, and
certainly inefficient.  Are you saying that since this happens
infrequently, we could process such file names in a broken way,
e.g. finding a directory separator where there's none, as demonstrated
in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13515#5?

> > So some things will never work with encoded file names, but I guess no
> > one cares, because most of those problems go away if the encoding is
> > UTF-8.  Fine; if no one cares, neither do I.
> 
> Actually, even with other coding systems, this shouldn't be a serious
> issue since encoded file names should be rare.

The code needs to be there anyway.  We cannot remove it, and we cannot
break it, because people will complain.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]