[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: <reductions>
From: |
Wojciech Polak |
Subject: |
Re: <reductions> |
Date: |
Tue, 09 Oct 2007 18:26:45 +0200 |
Hello,
On Sun, 30 Sep 2007, Akim Demaille wrote:
> I can be wrong, but I'd feel better if the XML file was
> without redundancy, even if that requires a bit more work
> from the XSLT tools. Work that I guess can be factored with
> an XLST library tailored to our XML format (I'm using words
> I understand, but which I never practiced for real, so I
> might suggest stupid things here :).
I'd feel better too, but sometimes rendundancy can simplify
the processing. Read below...
On Sun, 30 Sep 2007 20:57:40 -0400 Joel E. Denny wrote:
> In the automaton, instead of:
> <itemset>
> <rule number="0">
> <lhs>$accept</lhs>
> <rhs>
> <symbol class="nonterminal">exp</symbol>
> <symbol class="terminal">$end</symbol>
> <point/>
> </rhs>
> </rule>
> </itemset>
>
> we could have:
>
> <itemset>
> <item rule-number="0" marker="2" kernel="true" />
> </itemset>
I wrote the patch which generates the code as above (except
for the kernel attribute which I haven't finished)
and the result is that XML is smaller, but the processing
time is longer (XSLT via xsltproc). File sizes:
Before (with --report=all):
212K anubis.xml
1,4M awk.xml
144K bison.xml
28K calc.xml
2,1M c.xml
8,0K errors.xml
1,1M pascal.xml
964K rewrite.xml
96K sieve.xml
After (less redundancy, still with --report=all):
116K anubis.xml
592K awk.xml
108K bison.xml
16K calc.xml
796K c.xml
8,0K errors.xml
504K pascal.xml
392K rewrite.xml
60K sieve.xml
And the processing time:
Before (processing above XML files with --report=state and --report=all):
$ time make text
for i in xml-state/*.xml; do \
xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
done
for i in xml-all/*.xml; do \
xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
done
real 0m6.994s
user 0m6.500s
sys 0m0.147s
After (with less redundancy):
$ time make text
for i in xml-state/*.xml; do \
xsltproc xslt/xml2text.xsl $i >output-state-from-xml/`basename $i`.output; \
done
for i in xml-all/*.xml; do \
xsltproc xslt/xml2text.xsl $i >output-all-from-xml/`basename $i`.output; \
done
real 0m21.371s
user 0m21.216s
sys 0m0.115s
(perhaps my xml2text.xsl is not perfect, but still...)
Although XML is smaller, the processing with XSLT is a little more
difficult (I adjusted xml2text.xsl to generate exactly the same
output as the original one in CVS) and slower. For me this is
disk space vs performance and processing ease. Even if XSLT is
quite easy to adjust, it can be very difficult or even impossible
straight way to process less-redundancy XML with SAX as it
is a stream event-driven processing.
My patch for C and XSLT against yesterday CVS head is attached,
although it won't work after Joel's today commits.
On 30 Sep 2007, Joel E. Denny wrote:
> xml2xhtml.xsl and xml2text.xsl now share a template for computing
> conflicts. As Akim suggested, I've started a library. I named it
> bison.xsl.
Very good idea!
> I committed the following.
One more thing, I thought the bison-patches (and similar lists)
is a list for putting stuff before committing it (so we can
discuss the best solutions and etc.), and not after...
Anyway, good work Joel.
> As we refactor the XML implementation to remove redundancies,
> this will make regression testing much easier.
Finally, I would be very careful while trying to remove all
redundancy from XML. Disk space is cheap, but processing time,
performance and/or processing ease might be not (XSLT is not
the only way to process XML)... But of course we should try
to achieve the best of it :).
Regards,
Wojciech
bison-cvs.diff
Description: Text Data
- Re: <reductions>,
Wojciech Polak <=
Re: <reductions>, Joel E. Denny, 2007/10/14
Re: <reductions>, Joel E. Denny, 2007/10/17