[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: <reductions>
From: |
Joel E. Denny |
Subject: |
Re: <reductions> |
Date: |
Tue, 9 Oct 2007 23:59:11 -0400 (EDT) |
On Tue, 9 Oct 2007, Wojciech Polak wrote:
> My personal opinion is that I would stay for a while
> with the current solution, and not try to remove
> all redundancy too fast, because XSLT is not the only
> way to process XML (I just sent email about it to you
> and bison-patches).
>
> What if one wants to process only a selected part
> of this XML? For instance with SAX? After removing
> all redundancy it's just very hard. My biggest .xml
> from Bison was C grammar - 2,1MB with --report-all.
> And I just think it's not too big.
On Tue, 9 Oct 2007, Wojciech Polak wrote:
> On Sun, 30 Sep 2007, Akim Demaille wrote:
> > I can be wrong, but I'd feel better if the XML file was
> > without redundancy, even if that requires a bit more work
> > from the XSLT tools. Work that I guess can be factored with
> > an XLST library tailored to our XML format (I'm using words
> > I understand, but which I never practiced for real, so I
> > might suggest stupid things here :).
>
> I'd feel better too, but sometimes rendundancy can simplify
> the processing. Read below...
I still feel that Bison should generate XML that has minimal redundancy
and that is not tailored for any specific application, such as generating
the --report text. That is, the XML should just contain the basic data.
If a user finds that the Bison-generated XML is not optimal for his
application (perhaps because it uses SAX), he can then write XSLT (perhaps
with the help of Bison's XSLT library) to convert it to a form that is
perfect for his application. This new form may be completely different
than the minimal XML that Akim and I have been discussing, and it may be
completely different from even the XML that Bison generates now.
In other words, I don't think we can expect Bison to generate XML that
will suit every user's needs. Instead, we should just generate something
general-purpose and minimal that can be massaged as needed.
> On Sun, 30 Sep 2007 20:57:40 -0400 Joel E. Denny wrote:
> > In the automaton, instead of:
> > <itemset>
> > <rule number="0">
> > <lhs>$accept</lhs>
> > <rhs>
> > <symbol class="nonterminal">exp</symbol>
> > <symbol class="terminal">$end</symbol>
> > <point/>
> > </rhs>
> > </rule>
> > </itemset>
> >
> > we could have:
> >
> > <itemset>
> > <item rule-number="0" marker="2" kernel="true" />
> > </itemset>
>
> I wrote the patch which generates the code as above (except
> for the kernel attribute which I haven't finished)
> and the result is that XML is smaller, but the processing
> time is longer (XSLT via xsltproc). File sizes:
>
> Before (with --report=all):
> 212K anubis.xml
> 1,4M awk.xml
> 144K bison.xml
> 28K calc.xml
> 2,1M c.xml
> 8,0K errors.xml
> 1,1M pascal.xml
> 964K rewrite.xml
> 96K sieve.xml
>
> After (less redundancy, still with --report=all):
> 116K anubis.xml
> 592K awk.xml
> 108K bison.xml
> 16K calc.xml
> 796K c.xml
> 8,0K errors.xml
> 504K pascal.xml
> 392K rewrite.xml
> 60K sieve.xml
I think this is a nice improvement. Moreover, the first Torture Test
(156: torture.at:139 Big triangle) no longer fails on my system.
xsltproc can handle it now.
> And the processing time:
>
> Before (processing above XML files with --report=state and --report=all):
>
> $ time make text
> for i in xml-state/*.xml; do \
> xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
> done
> for i in xml-all/*.xml; do \
> xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
> done
>
> real 0m6.994s
> user 0m6.500s
> sys 0m0.147s
>
> After (with less redundancy):
>
> $ time make text
> for i in xml-state/*.xml; do \
> xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
> done
> for i in xml-all/*.xml; do \
> xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
> done
>
> real 0m21.371s
> user 0m21.216s
> sys 0m0.115s
>
> (perhaps my xml2text.xsl is not perfect, but still...)
I tried existing.at's test grammars and found similar slow downs.
I added this:
<xsl:key
name="ruleKey" match="/bison-xml-report/grammar/rules/rule"
use="@number"
/>
In your xsl:template match="item", I use it for looking up rules. For the
existing.at grammars, the real timings are nearly back to where they were
before.
Before your patch:
- GNU AWK: 1.471s
- GNU Cim: 1.712s
- GNU pic: 2.933s
After your patch:
- GNU AWK: 5.248s
- GNU Cim: 6.242s
- GNU pic: 12.749s
After your patch with the xsl:key:
- GNU AWK: 1.711s
- GNU Cim: 1.939s
- GNU pic: 3.339s
When I then use the key in xsl:template match="reduction" as well, I get:
- GNU AWK: 1.595s
- GNU Cim: 1.622s
- GNU pic: 3.171s
There may be other fine-tuning that would help, but I haven't looked very
hard. I figured there would be a place for keys once we had
cross-references, so I just tried that. In any case, these times seem
reasonable to me.
Interestingly, for ISO 2003 C++, I get:
- Before your patch: 7.267s, 4.9MB
- After your patch and the keys: 6.597s, 2MB
> Although XML is smaller, the processing with XSLT is a little more
> difficult (I adjusted xml2text.xsl to generate exactly the same
> output as the original one in CVS) and slower. For me this is
> disk space vs performance and processing ease. Even if XSLT is
> quite easy to adjust, it can be very difficult or even impossible
> straight way to process less-redundancy XML with SAX as it
> is a stream event-driven processing.
I think the XSLT isn't too difficult. It should be straight-forward for a
user to write XSLT to produce XML more suited to applications using SAX.
> My patch for C and XSLT against yesterday CVS head is attached,
> although it won't work after Joel's today commits.
Actually, it worked for me just using the patch command.
Re: <reductions>, Joel E. Denny, 2007/10/14
Re: <reductions>, Joel E. Denny, 2007/10/17