Re: avoid character encoding/escaping in sxml->xml or htmlprag's sxml->h

From: Maxime Devos
Subject: Re: avoid character encoding/escaping in sxml->xml or htmlprag's sxml->html
Date: Sun, 21 Aug 2022 12:16:54 +0200
On 21-08-2022 02:05, Aleix Conchillo Flaqué wrote:

According to the spec, embedding inline content in the <script> tag should conform to the language defined by the "type" attribute (defaults to javascript). So, I would expect you could put any string that conforms to JS.

When used to include dynamic scripts, the scripts may either be embedded inline or may be imported from an external file using the src attribute. If the language is not that described by "text/javascript", then the type attribute must be present, as described below. Whatever language is used, the contents of the script element must conform with the requirements of that language's specification

I am proposing to use XHTML (which is XML), not HTML. HTML's special parsing quirks are irrelevant here.

It does, browsers (at least Chrome) don't interpret that correctly, since it's not valid JavaScript.
As <script> ... </script> is XML, the XML parser  (not the HTML parser, this is XHTML!) will decode the &lt; inside the <script>...</script>, the result _after decoding_ is valid JavaScript.  In XML, <script> is not special -- everything is parsed the same way in XML.

Anyway, it seems to work for me, both in icecat and ungoogled-chromium:

(use-modules (web server))
(define document
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<html xmlns=\"\";>
<script type=\"text/javascript\">

(define (handler request request-body)
  (values '((content-type application/xhtml+xml))

(run-server handler 'http)
-- on the console, <Hi!> is logged, not &lt;Hi!&gt;.

If I replace &lt; by < and &gt; by > to make it 'valid JavasScript' as you appear to be proposing, I get a parsing error:

      This page contains the following errors:

error on line 8 at column 17: error parsing attribute name

      Below is a rendering of the page up to the first error.


XML Parsing Error: not well-formed
Location: http://localhost:8080/
Line Number 8, Column 17:


