[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Multibyte and unibyte file names
From: |
Eli Zaretskii |
Subject: |
Re: Multibyte and unibyte file names |
Date: |
Sat, 26 Jan 2013 13:27:12 +0200 |
> From: "Stephen J. Turnbull" <address@hidden>
> Cc: Stefan Monnier <address@hidden>,
> address@hidden,
> address@hidden,
> address@hidden
> Date: Sat, 26 Jan 2013 12:04:50 +0900
>
> "Unibyte" as implemented in Emacs is a premature optimization, and a
> disaster in search of places to happen. Remove it, and you'll never
> notice it's gone. The consequence of that removal would be to fix
> this problem, permanently.
I don't think you are entirely correct. We still need to send encoded
(unibyte) strings to the outside world. IOW, file names are not the
only user of unibyte strings.
> As Stefan says, there would remain a more general problem that -- with
> the exception of Windows Unicode APIs -- that there is no absolutely
> reliable way of determining the user's intended encoding.
That's a non-issue: we treat unibyte file names as encoded in
file-name-coding-system. Nothing else is supported, or needed.
> However, the only important cases where this interferes with usual
> filename parsing needs are Shift JIS and Big 5 on Windows, where you
> *do* have that absolutely reliable alternative.
Again, detecting the encoding is a non-issue. When I see an encoded
file name, I always _know_ how it was encoded, and I can decode it by
using DECODE_FILE.
> The right thing to do in some sense is to have an "external file name
> type" which stores both the Emacs string name and (if the name was
> received as bytes from outside) a representation of those bytes.
> Rather than change the Lisp_String structure, I would recommend
> putting a property (`text-as-received', `externally-coded-text', or
> whatever) on the string. The content of the property would be the
> filename decoded as 'binary (or perhaps using Emacs's
> undecodable-bytes representation).
>
> Although Emacs doesn't seem to have string properties (ie, on the
> object), one can put a text property on the string (or use an overlay,
> which might work for the degenerate case of a 0-length string). This
> would allow callers (and sufficiently Type A users) to retry decoding
> with a different encoding.
>
> Of course this requires rather smart callers if they slice-n-dice the
> file name.
Exactly. Moreover, what you suggest is a large project that won't
happen without a motivated individual. Given the overall "cannot
happen on POSIX, so it's SEP" reaction I got to this thread, what do
you think are the chances of such a project to materialize any time
soon?
And that is even before we start to talk about the details of your
proposal and consider its downsides (what to do when
file-name-coding-system is changed, too many overlays adversely impact
performance, ...).
- Re: Multibyte and unibyte file names, (continued)
- Re: Multibyte and unibyte file names, Stefan Monnier, 2013/01/24
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/25
- Re: Multibyte and unibyte file names, Stefan Monnier, 2013/01/25
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/25
- Re: Multibyte and unibyte file names, Stefan Monnier, 2013/01/25
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/26
- Re: Multibyte and unibyte file names, Stefan Monnier, 2013/01/26
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/26
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/26
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/25
- Re: Multibyte and unibyte file names,
Eli Zaretskii <=
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/26
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/26
- Re: Multibyte and unibyte file names, Paul Eggert, 2013/01/26
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/26
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/26
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/26
- Re: Multibyte and unibyte file names, Paul Eggert, 2013/01/26
- Re: Multibyte and unibyte file names, Eli Zaretskii, 2013/01/26
- Re: Multibyte and unibyte file names, Stephen J. Turnbull, 2013/01/26
- Re: Multibyte and unibyte file names, Stefan Monnier, 2013/01/26