bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unescaped content in HTML attributes in EPUB output of texi2any


From: Benjamin Kalish
Subject: Re: Unescaped content in HTML attributes in EPUB output of texi2any
Date: Sun, 22 Sep 2024 17:27:34 -0400

It looks like the problem occurs only with the use of raw HTML, directly (as in the minimal example here), or indirectly through a macro (as I first encountered it):

\input texinfo

@node Top
@top

@node Cap 1
@chapter @inlineraw{html,<span class="test">}One@inlineraw{html,</span>}

@bye

Benjamin Kalish


On Sun, Sep 22, 2024 at 4:47 PM Gavin Smith <gavinsmith0123@gmail.com> wrote:
On Sun, Sep 22, 2024 at 02:20:36PM -0400, Benjamin Kalish wrote:
> EPUB output contains unescaped content in a number of HTML attributes. I'm
> seeing this with:
>
> - The content attribute for <meta> with name="description"
> - The content attribute for <meta> name="keywords"
> - The title attribute of the <link> elements with rel="next" and rel="prev"
>
> HTML output also has these same tags and attributes, but the content seems
> fine in my case. This may not actually be due to better escaping, as it
> looks like entirely different content is being used for the attribute
> values when generating HTML, and the content is, in this case at least,
> safe without escaping.
>
> Changing the values to be the same as those used when generating HTML would
> solve the problem in my case, but it is probably best to make sure that
> attribute values are always escaped.
>
> What should be escaped? Quotation marks must be. Ambiguous ampersands must
> be. But it is probably prudent to escape all ampersands and all
> occurrences of < or >.
>
> I'm sorry I can't suggest a fix in the code—I'm not familiar with the
> Texinfo codebase and it's been decades since I've coded in Perl or C.
>
> I'm using texi2any 7.1.1

I tried testing this on the master development branch and it looked
ok:

$ cat test.texi
\input texinfo

@node Top
@top

@node Cap 1
@chapter One "<>

@bye

After running "texi2any --epub3 test.texi" and extracting the
resulting "test.epub" file, the output file in the ZIP archive had, in
"test/EPUB/xhtml/Cap-1.xhtml", the " < and > escaped (see output below).
Can you please explain how to reproduce the problem?


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>1 One &quot;&lt;&gt; (Untitled Document)</title>

<meta name="description" content="1 One &quot;&lt;&gt; (Untitled Document)"/>   
<meta name="keywords" content="1 One &quot;&lt;&gt; (Untitled Document)"/>
<meta name="resource-type" content="document"/>
<meta name="distribution" content="global"/>
<meta name="Generator" content="texi2any"/>
<meta name="viewport" content="width=device-width,initial-scale=1"/>

<link href="" rel="start" title=""/>
<link href="" rel="index" title="1 One &quot;&lt;&gt;"/>
<link href="" rel="up" title=""/>
<link href="" rel="prev" title=""/>


</head>

<body lang="en">
<div class="chapter-level-extent" id="Cap-1">

<h2 class="chapter" id="One-_0022_003c_003e">1 One &quot;&lt;&gt;</h2>



</div>



</body>
</html>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]