Re: GNUstep Web browser (was Re: WebKit Bounty)

discuss-gnustep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNUstep Web browser (was Re: WebKit Bounty)

From:	Robert Slover
Subject:	Re: GNUstep Web browser (was Re: WebKit Bounty)
Date:	Sun, 04 Mar 2007 20:49:59 -0500


On Mar 4, 2007, at 2:12 AM, Gregory John Casamento wrote:

Rogelio,


... [elided] ...

If html is so easy to do wrong and so hard to handle then we put a
bullet in the s*****'s head  and move on.
It's not that easy... it's nice to say that we will make a parser thatwill only handle "correct" HTML, but when you consider that this willmake the browser virtually useless for navigating almost half of theweb pages out there, the idea looses it's appeal. If you write afrom scratch implementation you will need to handle such pages, if youwant anyone to actually use it.
Later, GJC

... [elided] ...

I do not know if this helps or not, but I'll make the suggestionanyway. Several years ago I needed a parser for a project at work thatcould help extract all of the links and URL references in a set ofrelated HTML documents, then let me re-write the documents. This hadtwo purposes -- rewriting a set of HTML pages as a multi-part relatedMIME message including all images and directly related documents foremailing, and 'retargeting' -- moving a set of related HTML pages intoan altered hierarchy simply by describing the relationships between twohierarchies (from the one used in our application to the one used by anarbitrary customer Intranet) and a starting point. The real monkeywrench was that the HTML was often very sloppy, containing fragments ofHTML customers had entered themselves to customize the output, as wellas incorrect HTML produced by 3rd-party software modules (which we hadsource to, but no budgeted time to fix). While the latter we could dosomething about, the former we could not. My solution was to useHTML-Tidy, a W3C project by Dave Ragget. (http://www.w3.org/People/Raggett/tidy/ ). There was a project underwayat the time to turn Tidy into a library, but it still had a way to go-- so, instead, one of our developers took about 3 days and turned itinto a library suitable to our purpose that worked where we needed itto -- AIX and Solaris. He gave it an interface that was very much likeSAX, on top of which we wrote our logic to re-write pages on the fly.The Tidy code was very clean and easy to understand C, so this was astraightforward endeavor. We were then able to handle broken pages,with the added advantage that pages that were externalized by theapplication in this way were also "correct" HTML, regardless offragmentary or incorrect input. This has worked so well that we've nothad to touch it since (5 or 6 years).

There, of course, now exists the official TidyLib, which I do not knowa lot about, but it could be a useful tool in getting from the point ofhaving a renderer that works with correct HTML/XML to one that canunderstand the bulk of the incorrect HTML that exists in the realworld.


--Robert

[Prev in Thread]

Current Thread

[Next in Thread]

Re: GNUstep Web browser (was Re: WebKit Bounty), Michael Thaler, 2007/03/03
- Re: GNUstep Web browser (was Re: WebKit Bounty), Thom Cherryhomes, 2007/03/03
- Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio M. Serrano Jr., 2007/03/03
- Re: GNUstep Web browser (was Re: WebKit Bounty), Gregory John Casamento, 2007/03/03
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Lars Sonchocky-Helldorf, 2007/03/05
- Re: GNUstep Web browser (was Re: WebKit Bounty), Christopher Armstrong, 2007/03/04
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Richard Frith-Macdonald, 2007/03/04
- Re: GNUstep Web browser (was Re: WebKit Bounty), Gregory John Casamento, 2007/03/04
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio Serrano, 2007/03/04
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Robert Slover <=
    - Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio Serrano, 2007/03/04
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Lars Sonchocky-Helldorf, 2007/03/05
- Re: GNUstep Web browser (was Re: WebKit Bounty), Gregory John Casamento, 2007/03/04
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio M. Serrano Jr., 2007/03/04
    - Re: GNUstep Web browser (was Re: WebKit Bounty), Richard Frith-Macdonald, 2007/03/04
    - Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio Serrano, 2007/03/04
- Re: GNUstep Web browser (was Re: WebKit Bounty), Gregory John Casamento, 2007/03/04
- Re: GNUstep Web browser (was Re: WebKit Bounty), address@hidden, 2007/03/05
  - Re: GNUstep Web browser (was Re: WebKit Bounty), Rogelio M. Serrano Jr., 2007/03/05
  - Message not available
    - Re: GNUstep Web browser (was Re: WebKit Bounty), address@hidden, 2007/03/05

Prev by Date: Re: new icon set
Next by Date: Re: GNUstep Web browser (was Re: WebKit Bounty)
Previous by thread: Re: GNUstep Web browser (was Re: WebKit Bounty)
Next by thread: Re: GNUstep Web browser (was Re: WebKit Bounty)
Index(es):
- Date
- Thread