emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Encoding Problem in export?


From: Nicolas Goaziou
Subject: Re: [O] Encoding Problem in export?
Date: Thu, 25 Jul 2013 23:46:34 +0200

Hello,

David Maus <address@hidden> writes:

> IIRC org-link-escape is not used to create URLs but to escape
> characters in a link that would otherwise conflict with Orgmode syntax
> (e.g. square brackets).

> Org applies percent escaping to a link before
> it is stored in the buffer and applies unescaping when it reads a link
> back.
>
> The percent sign is hardcoded because if org-link-escape/unescape is
> used in this way we must make sure that the identity of a link is
> preserved. If we would *not* escape the percent sign, then an original
> link with percent encoded characters would be read back wrongly,
> i.e. with the percent escaped characters unescaped.

[...]

> There is, of course, the nasty thing that we don't know if the link in
> a buffer went through org-link-escape or not. E.g. if you paste
>
> ,----
> | 
> [[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]]
> `----
>
> into the buffer you'll get a broken link because org-link-open assumes
> the link to be escaped by org.
>
> The bottom-line: Org creates link programmatically (org-store-link)
> and needs a mechanism to protected conflicting characters. It chose
> percent-escaping and in order to preserve the identity of a link Org
> has to escape the escape-character.
>
> Hope that helps!

It does.

I think we are hunting two hares and that's why we are failing so far.

There are two URI transformations involved. One is mandatory (escape
square brackets in URI), and the other one is optional (normalize URI
for external processes consumption). The former must be bi-directional,
as escaping brackets must be transparent to the user (e.g., when editing
a link with `org-insert-link'). The latter needn't and can happen on the
fly, just before the URI is sent to whatever needs it (e.g., a browser).

Therefore, I suggest to use three functions:

  - `org-link-escape will first %-escape "%" characters, and then "["
    and "]" characters. `org-link-unescape' will reverse the operation.

    These function cannot break a link, encoded or not. They are applied
    when a link is created programmatically and read back for user
    editing.

  - `org-link-encode'[1] will %-escape every forbidden character in the
    URI. It doesn't need any "reverse" function. It will be called when
    opening a link, or parsing it.

    I think it shouldn't escape "%" characters, though, so that it can
    be applied on both encoded and plain strings. Since it isn't perfect
    (it doesn't parse URI), it should also be very conservative (i.e.
    allow more characters such as "=" or "&") and not get in the way.

WDYT?


Regards,

[1] `url-encode-url' was introduced in Emacs 24.3. It is too young to be
used mainstream, even though it does a better job than
`org-link-escape'. We will benefit from it when Emacs 25 is out (i.e.
when Emacs 23 support is dropped).

-- 
Nicolas Goaziou



reply via email to

[Prev in Thread] Current Thread [Next in Thread]