[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decod
From: |
Eli Zaretskii |
Subject: |
bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text |
Date: |
Sat, 05 May 2018 12:37:24 +0300 |
Ping! Ping!
> Date: Tue, 24 Apr 2018 21:11:10 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, 31149@debbugs.gnu.org, monnier@IRO.UMontreal.CA
>
> Ping!
>
> > Date: Sat, 14 Apr 2018 09:32:41 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: larsi@gnus.org, 31149@debbugs.gnu.org
> >
> > > From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> > > Date: Fri, 13 Apr 2018 16:55:26 -0400
> > > Cc: Lars Ingebrigtsen <larsi@gnus.org>
> > >
> > > (gui-get-selection nil 'text/html)
> > >
> > > returns utf-16 text when the primary selection is owned by Mozilla, but
> > > we decode it as latin-1 instead, so it looks like garbage.
> > >
> > > I don't know why we're getting utf-16. Is that what standards say it
> > > should do? If so, we should adjust our code (which currently knows
> > > nothing about the `text/html` target-type).
> > >
> > > As for why we decode it as latin-1, it's (under GNU/Linux; Lars may be
> > > using something else because he's getting something with a `charset`
> > > property which I don't get here) because:
> > > - selection_data_to_lisp_data (in xselect.c) makes a unibyte string with
> > > the property `foreign-selection` set to `STRING` when the actual
> > > string type is not known (as opposed to COMPOUND-TEXT and
> > > UTF8-STRING, basically).
> > > - in gui-get-selection we then have a mapping from `STRING` to
> > > `iso-8859-1` (which is apparently the right thing for the official
> > > `STRING` target-type in X11).
> > >
> > > I can't figure out if/where these kinds of things about the X11
> > > selection protocol is described, but at least in `xclip` they have
> > > a hack specifically for this case:
> > >
> > > [...]
> > > if (html != None && sel_type == html) {
> > > /* if the buffer contains UCS-2 (UTF-16), convert to
> > > * UTF-8. Mozilla-based browsers do this for the
> > > * text/html target.
> > > */
> > > [...]
> > >
> > > and according to the subsequent code it's not even always the
> > > same endianness.
> > >
> > > I don't know what is the difference between the `target-type` passed to
> > > x-get-selection-internal and the `foreign-selection` property we get on
> > > the returned string (they seem to be the same in my tests, except when
> > > the type is not one of the known ones, and where we then force
> > > `foreign-selection` to be `STRING`).
> >
> > I hope Handa-san (CC'ed) could comment on this.
>
>
>
>
- bug#31149: 27.0.50; (gui-get-selection nil 'text/html) returns mis-decoded text,
Eli Zaretskii <=