emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master 6011d39b6a: Fix drag-and-drop of files with multibyte filenam


From: Eli Zaretskii
Subject: Re: master 6011d39b6a: Fix drag-and-drop of files with multibyte filenames
Date: Sun, 05 Jun 2022 15:54:18 +0300

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Sun, 05 Jun 2022 19:42:49 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Then why not encode in UTF-8, for example?
> 
> How about (or file-name-coding-system default-file-name-coding-system)
> instead?  AFAICT, that's what ENCODE_FILE does.

Yes.  Sorry, I forgot that the code was in Lisp, not C.

> > If some program other than Emacs is the target of the drop, raw bytes
> > produced from raw-text will not be meaningful for it.
> 
> Why not?  Aren't those bytes equivalent to a C string describing a file
> name that can be passed to `open'?

Not necessarily.  First, non-ASCII characters can be encoded in
different ways, and the other program might not necessarily support
more than just the locale's encoding.  And second, any characters to
which Emacs gives codepoints beyond the Unicode codespace (something
that is rare, but it does happen) will not be understood by the other
programs at all, because their codepoints are completely private to
Emacs.

> I wrote that code according to how C_STRINGs are already encoded in
> select.el:
> 
>          ((eq type 'C_STRING)
>             ;; According to ICCCM Protocol v2.0 (para 2.7.1), C_STRING
>             ;; is a zero-terminated sequence of raw bytes that
>             ;; shouldn't be interpreted as text in any encoding.
>             ;; Therefore, if STR is unibyte (the normal case), we use
>             ;; it as-is; otherwise we assume some of the characters
>             ;; are eight-bit and ensure they are converted to their
>             ;; single-byte representation.
>             (or (null (multibyte-string-p str))
>                 (setq str (encode-coding-string str 'raw-text-unix))))

See the comment: it explicitly tells about "strings" that aren't text.
File names are always human-readable text, or at least they should be.

> > I actually don't understand why you don't use ENCODE_FILE for files
> > and ENCODE_SYSTEM for everything else -- this is the only encoding
> > which we know to be generally suitable for any operation that calls
> > low-level C APIs whose implementation is not in Emacs.  Bonus points
> > for adhering to selection-coding-system when that is non-nil.
> >
> > Are there any known problems with using these two system encodings in
> > this case?
> 
> Yes: the entire selection mechanism is implemented in Lisp, and moving
> parts to C specifically would require some rethinking of the C code
> involved, and wouldn't be backwards-compatible.

No need to move anything to C: you can do the same in Lisp.  See
above.

> The FILE_NAME target has existed for decades in Lisp for programs that
> comply with the ICCCM and also deals with all kinds of file name
> encodings (see the call to `xselect--encode-string' in
> `xselect-convert-to-filename'), so I don't see why this code cannot.

<Shrug> I guess that other code is also incorrect, and was never
seriously tested with non-ASCII file names outside of UTF-8 locales.
Try Emacs whose file-name-coding-system is iso-2022-jp or somesuch.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]