[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Permissive html parser for guile
From: |
swedebugia |
Subject: |
Permissive html parser for guile |
Date: |
Wed, 23 Jan 2019 17:47:56 +0100 |
I just found this LGPL3 parser by Neil Van Dyke (see attachment)
Do we have something similar in guile?
If not is anybody interested in porting it? (I have no idea how much
work it would be, but Racket seems quite close to guile)
Here is the introduction:
"The html-parsing library provides a permissive HTML parser. The parser
is useful for software agent extraction of information from Web pages,
for programmatically transforming HTML files, and for implementing
interactive Web browsers. html-parsing emits SXML/xexp, so that
conventional HTML may be processed with XML tools such as SXPath. Like
Oleg Kiselyov’s SSAX-based HTML parser, html-parsing provides a
permissive tokenizer, but html-parsing extends this by attempting to
recover syntactic structure.
The html-parsing parsing behavior is permissive in that it accepts
erroneous HTML, handling several classes of HTML syntax errors
gracefully, without yielding a parse error. This is crucial for parsing
arbitrary real-world Web pages, since many pages actually contain syntax
errors that would defeat a strict or validating parser. html-parsing’s
handling of errors is intended to generally emulate popular Web
browsers’ interpretation of the structure of erroneous HTML."
https://docs.racket-lang.org/html-parsing/index.html
--
Cheers Swedebugia
main.rkt
Description: Text document
- Permissive html parser for guile,
swedebugia <=