guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using shtml with htmlprag - output of shtml->html is different to so


From: Neil Van Dyke
Subject: Re: Using shtml with htmlprag - output of shtml->html is different to some given HTML
Date: Thu, 5 Sep 2019 18:33:57 -0400

Kenan, could you please try the below "one-line" change, and let me know what you think?

(It's an attempt at a minimal fix for the problem you were seeing, and for some related problems with modern HTML.  However, it breaks backward-compatibility relative to the htmlprag currently in guile-lib.  For example, consider someone doing Web scraping of modern HTML, and their scraping code only works with the previous, invalid parse.  I'm not yet familiar with guile-lib and how the htmlprag in it is being used, so I don't want to be too quick to suggest breaking changes to it.)

(Historical note: htmlprag was mostly written 18 years ago, when HTML was different in both standards and practice.  Today, I'd write the parser very differently, though I think there's a good chance that htmlprag will still work for one's purpose, with this change.)

Neil

--- htmlprag.scm.ORIG    2019-09-05 18:21:40.850220789 -0400
+++ htmlprag.scm    2019-09-05 18:21:40.850220789 -0400
@@ -1099,7 +1099,7 @@
               (meta     . (head))
               (noframes . (frameset))
               (option   . (select))
-              (p        . (body td th))
+              (p        . (div blockquote body footer header li td th))
               (param    . (applet))
               (tbody    . (table))
               (td       . (tr))
@@ -1989,6 +1989,13 @@
     (t1 "<script>xxx"  '((script "xxx")))
     (t1 "<script/>xxx" '((script) "xxx"))

+    (t1 "<div><p>x</p></div>" '((div        (p "x"))))
+    (t1 "<header><p>x</p></>" '((header     (p "x"))))
+    (t1 "<footer><p>x</p></>" '((footer     (p "x"))))
+    (t1 "<blockquote><p>x</p></blockquote>" '((blockquote (p "x"))))
+    (t1 "<ul><li><p>x</p></li></ul>" '((ul (li     (p "x")))))
+    (t1 "<ol><li><p>x</p></li></ol>" '((ol (li     (p "x")))))
+
     ;; TODO: Add verbatim-pair cases with attributes in the end tag.

     (t2 '(p)            "<p></p>")




reply via email to

[Prev in Thread] Current Thread [Next in Thread]