Re: <reductions>

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: <reductions>

From:	Joel E. Denny
Subject:	Re: <reductions>
Date:	Tue, 9 Oct 2007 23:59:11 -0400 (EDT)

On Tue, 9 Oct 2007, Wojciech Polak wrote:

> My personal opinion is that I would stay for a while
> with the current solution, and not try to remove
> all redundancy too fast, because XSLT is not the only
> way to process XML (I just sent email about it to you
> and bison-patches).
> 
> What if one wants to process only a selected part
> of this XML? For instance with SAX? After removing
> all redundancy it's just very hard. My biggest .xml
> from Bison was C grammar - 2,1MB with --report-all.
> And I just think it's not too big.

On Tue, 9 Oct 2007, Wojciech Polak wrote:

> On Sun, 30 Sep 2007, Akim Demaille wrote:
> > I can be wrong, but I'd feel better if the XML file was
> > without redundancy, even if that requires a bit more work
> > from the XSLT tools.  Work that I guess can be factored with
> > an XLST library tailored to our XML format (I'm using words
> > I understand, but which I never practiced for real, so I
> > might suggest stupid things here :).
> 
> I'd feel better too, but sometimes rendundancy can simplify
> the processing. Read below...

I still feel that Bison should generate XML that has minimal redundancy 
and that is not tailored for any specific application, such as generating 
the --report text.  That is, the XML should just contain the basic data.

If a user finds that the Bison-generated XML is not optimal for his 
application (perhaps because it uses SAX), he can then write XSLT (perhaps 
with the help of Bison's XSLT library) to convert it to a form that is 
perfect for his application.  This new form may be completely different 
than the minimal XML that Akim and I have been discussing, and it may be 
completely different from even the XML that Bison generates now.

In other words, I don't think we can expect Bison to generate XML that 
will suit every user's needs.  Instead, we should just generate something 
general-purpose and minimal that can be massaged as needed.

> On Sun, 30 Sep 2007 20:57:40 -0400 Joel E. Denny wrote:
> > In the automaton, instead of:
> >      <itemset>
> >         <rule number="0">
> >           <lhs>$accept</lhs>
> >           <rhs>
> >             <symbol class="nonterminal">exp</symbol>
> >             <symbol class="terminal">$end</symbol>
> >             <point/>
> >           </rhs>
> >         </rule>
> >       </itemset>
> > 
> > we could have:
> > 
> >       <itemset>
> >         <item rule-number="0" marker="2" kernel="true" />
> >       </itemset>
> 
> I wrote the patch which generates the code as above (except
> for the kernel attribute which I haven't finished)
> and the result is that XML is smaller, but the processing
> time is longer (XSLT via xsltproc). File sizes:
> 
> Before (with --report=all):
> 212K    anubis.xml
> 1,4M    awk.xml
> 144K    bison.xml
> 28K     calc.xml
> 2,1M    c.xml
> 8,0K    errors.xml
> 1,1M    pascal.xml
> 964K    rewrite.xml
> 96K     sieve.xml
> 
> After (less redundancy, still with --report=all):
> 116K    anubis.xml
> 592K    awk.xml
> 108K    bison.xml
> 16K     calc.xml
> 796K    c.xml
> 8,0K    errors.xml
> 504K    pascal.xml
> 392K    rewrite.xml
> 60K     sieve.xml

I think this is a nice improvement.  Moreover, the first Torture Test 
(156: torture.at:139 Big triangle) no longer fails on my system.  
xsltproc can handle it now.

> And the processing time:
> 
> Before (processing above XML files with --report=state and --report=all):
> 
> $ time make text
> for i in xml-state/*.xml; do \
>  xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
> done
> for i in xml-all/*.xml; do \
>  xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
> done
> 
> real    0m6.994s
> user    0m6.500s
> sys     0m0.147s
> 
> After (with less redundancy):
> 
> $ time make text
> for i in xml-state/*.xml; do \
>  xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
> done
> for i in xml-all/*.xml; do \
>  xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
> done
> 
> real    0m21.371s
> user    0m21.216s
> sys     0m0.115s
> 
> (perhaps my xml2text.xsl is not perfect, but still...)

I tried existing.at's test grammars and found similar slow downs.

I added this:

  <xsl:key
    name="ruleKey" match="/bison-xml-report/grammar/rules/rule"
    use="@number"
  />

In your xsl:template match="item", I use it for looking up rules.  For the 
existing.at grammars, the real timings are nearly back to where they were 
before.

Before your patch:
  - GNU AWK: 1.471s
  - GNU Cim: 1.712s
  - GNU pic: 2.933s

After your patch:
  - GNU AWK: 5.248s
  - GNU Cim: 6.242s
  - GNU pic: 12.749s

After your patch with the xsl:key:
  - GNU AWK: 1.711s
  - GNU Cim: 1.939s
  - GNU pic: 3.339s

When I then use the key in xsl:template match="reduction" as well, I get:
  - GNU AWK: 1.595s
  - GNU Cim: 1.622s
  - GNU pic: 3.171s

There may be other fine-tuning that would help, but I haven't looked very 
hard.  I figured there would be a place for keys once we had 
cross-references, so I just tried that.  In any case, these times seem 
reasonable to me.

Interestingly, for ISO 2003 C++, I get:
  - Before your patch: 7.267s, 4.9MB
  - After your patch and the keys: 6.597s, 2MB

> Although XML is smaller, the processing with XSLT is a little more
> difficult (I adjusted xml2text.xsl to generate exactly the same
> output as the original one in CVS) and slower. For me this is
> disk space vs performance and processing ease. Even if XSLT is
> quite easy to adjust, it can be very difficult or even impossible
> straight way to process less-redundancy XML with SAX as it
> is a stream event-driven processing.

I think the XSLT isn't too difficult.  It should be straight-forward for a 
user to write XSLT to produce XML more suited to applications using SAX.

> My patch for C and XSLT against yesterday CVS head is attached,
> although it won't work after Joel's today commits.

Actually, it worked for me just using the patch command.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: <reductions>, Joel E. Denny, 2007/10/09
- Re: <reductions>, Tim Van Holder, 2007/10/09
  - Re: <reductions>, Joel E. Denny, 2007/10/09
    - Re: <reductions>, Tim Van Holder, 2007/10/10
    - Re: <reductions>, Joel E. Denny, 2007/10/10
- Re: <reductions>, Wojciech Polak, 2007/10/09
  - Re: <reductions>, Joel E. Denny <=
    - Re: <reductions>, Joel E. Denny, 2007/10/10
    - Re: <reductions>, Wojciech Polak, 2007/10/10
    - Re: <reductions>, Joel E. Denny, 2007/10/10
    - Re: <reductions>, Wojciech Polak, 2007/10/11
- Re: <reductions>, Joel E. Denny, 2007/10/14
  - Re: <reductions>, Wojciech Polak, 2007/10/16
    - Re: <reductions>, Joel E. Denny, 2007/10/16
    - Re: <reductions>, Wojciech Polak, 2007/10/17
- Re: <reductions>, Joel E. Denny, 2007/10/17
  - Re: <reductions>, Wojciech Polak, 2007/10/19

Prev by Date: Re: <reductions>
Next by Date: Re: <reductions>
Previous by thread: Re: <reductions>
Next by thread: Re: <reductions>
Index(es):
- Date
- Thread