emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Encoding Problem in export?


From: David Maus
Subject: Re: [O] Encoding Problem in export?
Date: Thu, 25 Jul 2013 06:05:24 +0200
User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.6 (Maruoka) FLIM/1.14.9 (Gojō) APEL/10.8 EasyPG/1.0.0 Emacs/24.3.50 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

Hi Nicolas,
Hi Nick,

At Wed, 24 Jul 2013 13:09:05 +0200,
Nicolas Goaziou wrote:
> 
> Hello,
> 
> Nick Dokos <address@hidden> writes:
> 
> > Maybe the thing to do is to delete '=' from org-link-escape-chars and
> > see what problems arise.
> 
> AFAICT, `url-encode-url' is subtler than that. It encodes characters
> whenever they are really forbidden, which is not the case of
> `org-link-escape'. Hence my initial question: do we need to reinvent the
> wheel?
> 
> > But I did find that '%' was originally in org-link-escape-chars and
> > David Maus hardcoded it (commit 139cc1d4), so that it is *always*
> > escaped.
> 
> I Cc David Maus in case he has time to enlighten us about his choice.
>

IIRC org-link-escape is not used to create URLs but to escape
characters in a link that would otherwise conflict with Orgmode syntax
(e.g. square brackets). Org applies percent escaping to a link before
it is stored in the buffer and applies unescaping when it reads a link
back.

The percent sign is hardcoded because if org-link-escape/unescape is
used in this way we must make sure that the identity of a link is
preserved. If we would *not* escape the percent sign, then an original
link with percent encoded characters would be read back wrongly,
i.e. with the percent escaped characters unescaped.

This broke links.

E.g. consider a redirector link to the target url
`http://target.example.org?id=33&format=html";':

,----
| 
http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml
`----

If we don't escape the percent sign but apply unescaping when, say,
the user opens the link we would get:

,----
| http://redirect.example.org?url=http://target.example.org?id=33&format=html
`----

And voila: The `format' parameter is turned into a query parameter of
redirect.example.org, not target.example.org.

The specs (RFC3986) have to say the following about escaping:

,----
|    Because the percent ("%") character serves as the indicator for
|    percent-encoded octets, it must be percent-encoded as "%25" for that
|    octet to be used as data within a URI.  Implementations must not
|    percent-encode or decode the same string more than once, as decoding
|    an already decoded string might lead to misinterpreting a percent
|    data octet as the beginning of a percent-encoding, or vice versa in
|    the case of percent-encoding an already percent-encoded string.
`----

There is, of course, the nasty thing that we don't know if the link in
a buffer went through org-link-escape or not. E.g. if you paste

,----
| 
[[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]]
`----

into the buffer you'll get a broken link because org-link-open assumes
the link to be escaped by org.

The bottom-line: Org creates link programmatically (org-store-link)
and needs a mechanism to protected conflicting characters. It chose
percent-escaping and in order to preserve the identity of a link Org
has to escape the escape-character.

Hope that helps!

Best,
  -- David
-- 
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... address@hidden
Email..... address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]