bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: url protection


From: Patrice Dumas
Subject: Re: url protection
Date: Wed, 3 Aug 2022 22:46:06 +0200

On Wed, Aug 03, 2022 at 12:08:15PM -0700, Per Bothner wrote:
> On 8/3/22 06:26, Patrice Dumas wrote:
> > The standard does not seems to clear on the encoding to use for the %
> > encodings.  URI::Escape has uri_escape() and uri_escape_utf8.  My
> > feeling is that the best would be to use first encode to the output
> > encoding and then call URI::Escape uri_escape().
> 
> If I read https://metacpan.org/pod/URI::Escape correctly,
> uri_escape_utf8 is equivalent to utf8::encode followed by uri_escape.
> 
> For html/xhtml output (including epub) I think we should keep it simple:
> always emit utf8.

This is not what we do in general for html/xhtml.  For epub we always
emit utf8, as it is mandated by the standard, but for html/xhtml, we
use, in the default case, the input encoding for the output encoding.

>  The input to url-encoding is a sequence
> of utf8-bytes. So whether to use uri_escape_utf8 or uri_escape
> depends on whether conversion to utf8 has already been done.

The conversion should not have already been done at that point, we are
still character strings in internal perl unicode encoding.  But that was
not really myquestion, my question was more on whether we should use the
output encoding to encode string before doing the URI::Escape call, or
always use UTF-8, even if the document encoding is not UTF-8.

> -- 
>       --Per Bothner
> per@bothner.com   http://per.bothner.com/
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]