[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using shtml with htmlprag - output of shtml->html is different to so
From: |
Neil Van Dyke |
Subject: |
Re: Using shtml with htmlprag - output of shtml->html is different to some given HTML |
Date: |
Thu, 5 Sep 2019 18:33:57 -0400 |
Kenan, could you please try the below "one-line" change, and let me know
what you think?
(It's an attempt at a minimal fix for the problem you were seeing, and
for some related problems with modern HTML. However, it breaks
backward-compatibility relative to the htmlprag currently in guile-lib.
For example, consider someone doing Web scraping of modern HTML, and
their scraping code only works with the previous, invalid parse. I'm
not yet familiar with guile-lib and how the htmlprag in it is being
used, so I don't want to be too quick to suggest breaking changes to it.)
(Historical note: htmlprag was mostly written 18 years ago, when HTML
was different in both standards and practice. Today, I'd write the
parser very differently, though I think there's a good chance that
htmlprag will still work for one's purpose, with this change.)
Neil
--- htmlprag.scm.ORIG 2019-09-05 18:21:40.850220789 -0400
+++ htmlprag.scm 2019-09-05 18:21:40.850220789 -0400
@@ -1099,7 +1099,7 @@
(meta . (head))
(noframes . (frameset))
(option . (select))
- (p . (body td th))
+ (p . (div blockquote body footer header li td th))
(param . (applet))
(tbody . (table))
(td . (tr))
@@ -1989,6 +1989,13 @@
(t1 "<script>xxx" '((script "xxx")))
(t1 "<script/>xxx" '((script) "xxx"))
+ (t1 "<div><p>x</p></div>" '((div (p "x"))))
+ (t1 "<header><p>x</p></>" '((header (p "x"))))
+ (t1 "<footer><p>x</p></>" '((footer (p "x"))))
+ (t1 "<blockquote><p>x</p></blockquote>" '((blockquote (p "x"))))
+ (t1 "<ul><li><p>x</p></li></ul>" '((ul (li (p "x")))))
+ (t1 "<ol><li><p>x</p></li></ol>" '((ol (li (p "x")))))
+
;; TODO: Add verbatim-pair cases with attributes in the end tag.
(t2 '(p) "<p></p>")