bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: url protection


From: Patrice Dumas
Subject: Re: url protection
Date: Sat, 6 Aug 2022 19:01:18 +0200

On Sat, Aug 06, 2022 at 03:20:15PM +0100, Gavin Smith wrote:
> Characters should be protected if they are not part of the syntax of the URL
> but they could be.
> 
> Maybe more readable than the WHATWG documentation:
> https://www.rfc-editor.org/rfc/rfc3986#page-12
> 
> This gives a list of reserved characters, of which there a quite a few.
> (It's likely that not all of them occur in Texinfo output.)

Why not?  In an @uref, the user may well put anything, possible using %
encoded or % unencoded text.

> So if an image filename has a colon in it, that colon should be encoded
> in the href attribute, but a colon that follows the protocol (http:) should
> not be encoded, as you say.  Perhaps the percent encoding algorithm could
> be performed on a subset of the URL, rather than taking a URL string and
> percent encoding throughout.

Indeed, I also figured out that image files, that we know are file names
and not true url should have much more protection.  What I will commit
will have everything percent encoded, except for / and :, as : can be a
drive letter in windows.  I do not know about other separators that
could be used in file names.

> The treatment of @url/@uref could be different, as you say.  The user provides
> the entire URL in the source document.  Arguably it is up to the user to
> percent encode appropriately within the URL, and non-ASCII bytes inside the
> argument are a risk that the user has made as to whether they are valid or
> not.

In that case, and for @email too, I settled on percent encoding
'lightly', non ascii characters, } {, spaces and not much more
keeping the character that can happen in urls as you describe above, as
well as %.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]