perlsgml-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Perlsgml-dev] [FYI/RFC] SGML::Parser::OpenSP and SGML::Parser...


From: Terje Bless
Subject: Re: [Perlsgml-dev] [FYI/RFC] SGML::Parser::OpenSP and SGML::Parser...
Date: Mon, 7 Apr 2003 10:14:57 +0200

Earl Hood <address@hidden> wrote:

>I personally have no problem if you want to use the perlSGML
>repository on savannah to host your work.  I, or Yann, could add
>you in as a developer if you are interested.  If your code is going
>to be licensed under the same terms as Perl, then there will be no
>licensing conflicts.  perlSGML is currently under the GPL, but I
>can re-license it to match Perl (which allows users to use the GPL
>or Artistic license).

Licensing and hosting issues haven't quite "gelled" yet. This is an
offshoot of the W3C Markup Validator project. Options tossed around are
running it as a W3C project in W3C semi-private CVS, a pure Sourceforge
project, a hybrid of the two, and now also Savannah and/or included in
perlSGML. There is also the possibility that it would be more appropriate
to include it in the OpenSP project as this is nominally "bindings" for
that specific parser (hmm. note to self: must remember to let
openjade-devel know about this code).

Licensing is not entirely clear either; one school of thought suggests this
would have to be under the W3C License (GPL-compatible), but that probably
isn't necessary. For ease of use, the Perl terms (e.g. GPL+Artistic) is
probably the sanest option.


>As for the package names, I do not have a problem if your work becomes
>SGML::Parser, assuming it becomes part of perlSGML.  Since perlSGML has
>not been touched in a long time, I would prefer not to have to make a
>new release just to accomodate a module name change if the distribution
>will no longer contain SGML::Parser.
>
>If your code is well-structured, I may even get motivated to contribute
>to the work you have done.  I'm currently doing some contract work
>involving SGML, so I at least have some renewed interest.

Heh! Therein lies some of the problem; the code is about as far from
structured as you could imagine. This project was started partially so I
could learn C++ and XS; I started out with pretty much zero knowledge of
either, or even of plain C. The code looks pretty much as you'd expect from
that.

Also, the "generic" API to OpenSP is rather limited -- I'm not sure yet
whether it can even get close to match what SGML::Parser provides today! --
and I don't even implement "bindings" to it, just some functions that
happen to /use/ that API.

Right now you can parse an SGML document instance and get back a structure
(ref to list of hashes) containing all detected errors and a "complex"
datastructure representing the document tree as a pseudo-DOM.

There are no facilities for incremental parsing, no callbacks, no access
methods for the "DOM", no utility methods for manipulating it, it only
parses XML so far (ironic ;D), etc. In fact, you can't even get attribute
values from it yet as I still need to write the code to extract the
"*CdataChunks" OpenSP uses internally[0].


And the inefficiency... Ye gods the sloth! :-)

Multiple passes over each datastructure (parse + post-parse building of the
datastructure) and multiple copies of each datastructure in memory (one
temporary, one for C++, and one for Perl/XS).



[0] - OpenSP uses ca. 1996-era C++ and it shows; all datatypes are
      internal hacks instead of using the STL. Strings are "unsigned
       char*", attribute values are returned as chunked "CharString*s"
     etc. Even I can see that this stuff is _crufty_!


>It would be nice to see perlSGML get updated to use more robust tool
>sets.  My initial goal of having a "pure-Perl" solution is no longer a
>goal wrt perlSGML.  I would prefer to have a robust took set that people
>will use and I think something that hooks into OpenSP would be great.

Well, I don't think it would be appropriate to call my code SGML::Parser --
at least in the near future -- as the original is far more functional.
That's why I've been thinking about making a "SGML::Parser" that is a
proxy/common frontend module with pluggable backends. That would let your
SGML::Parser be SGML::Parser::Perl and be the default implementation, and
SGML::Parser::OpenSP be an option.

I will likely have to do the same for the XML functionality in the
application; I need to do both SGML and XML, and while OpenSP does XML it
has limited support. e.g. I will probably have to make an
XML::Parser::OpenSP -- along with XML::Parser::Xerces and
XML::Parser::LibXML -- to plug into a generic XML::Parser frontend (not
called XML::Parser obviously, but analogous to the postulated SGML::Parser
frontend module).

IOW I have plenty of ambitions, but so far they are all vapourware and
would benefit greatly from a reality check before making any long term
plans based on them. :-)


I'll try to roll a release of my latest code some time not too distant so
you can have a look at it (need to add minimal docs and attribute values
first).

I'm still in shock that I actually managed to write XS/C++ code that even
compiles, much less does something usefull, so I'm pretty much wide open to
any and all suggestions about how to proceede and which direction to take
with this. :-)

I have ideas and ambitions, but I do not really expect them to survive
contact with reality...

-- 
Now Playing "Strange Fruit" by "Nina Simone"",
 from the album "Feeling Good - The Very Best Of".




reply via email to

[Prev in Thread] Current Thread [Next in Thread]