[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-source-highlight] Order of definitions in source-highlight 2.1
From: |
Lorenzo Bettini |
Subject: |
Re: [Help-source-highlight] Order of definitions in source-highlight 2.10 |
Date: |
Mon, 25 Aug 2008 10:47:16 +0200 |
User-agent: |
Thunderbird 2.0.0.16 (X11/20080724) |
address@hidden wrote:
I just upgraded source-highlight to 2.10 and I am noticing some strange
behavior.
Suppose we have the file foo.lang:
symbol = "/"
comment start "//"
And the file test.foo:
// foo
The language definition is taken from the source-highlight manual,
section 7.4: "Order of definitions". Note that the definitions are in
the wrong order, according to the manual: "The first expression will
always be matched first, and the second expression will never be
matched." And yet:
$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo
<!-- Generator: GNU source-highlight 2.10
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span class="comment">// foo</span>
</tt></pre>
This was different with version 2.9:
$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo
<!-- Generator: GNU source-highlight 2.9
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span class="symbol">//</span><span class="normal"> foo</span>
</tt></pre>
What has changed between version 2.9 and 2.10?
Hi
yes, the strategy for regular expression matching has changed: before it
used to build a huge regular expression with many alternatives; however,
this would make the handling of things such as backreferences a real
nightmare (since the number of backreference would have to be updated,
and the number of backreferences is limited to 9), in particular it
required to split regular expressions and the code was really buggy.
so in 2.10 I completely re-written the handling of regular expressions
(http://www.gnu.org/software/src-highlite/source-highlight.html#fn-29);
in particular, now each element has its own regular expression and the
engine tests each expression and, as explained in 7.12:
"As hinted at the beginning of Language Definitions, source-highlight
uses the definitions in the language definition file to internally
create, on-the-fly, regular expressions that are used to highlight the
tokens of an input file. Here we provide some internal details that are
crucial to understand how to write language definition files correctly29.
First of all, each element definition, an highlighting rule is created
by source-highlight (even if they correspond to the same language
element); thus, each language definition file will correspond to a list
of highlighting rules. For each line of the input file, source-highlight
will try to match all these rules against the whole line (more formally,
against the part of the line that has not been highlighted yet). It will
not stop as soon as an highlighting rule matched, since there might be
another rule that matches “better”.
The strategy used by source-highlight is to select the first rule that
matches the longest part of the text with the smallest prefix (i.e., the
initial part of the line that contains no language element). (Thus, as
already noted in the previous sections, the order of language
definitions is crucial.) Then, it will continue to search for another
matching rule for the remaining part of the line."
So the case of / and // respects this rule, since // matches better than /.
Of course, you're right: the example of 7.4 does not work anymore and I
have to update the documentation with a better example! Sorry about
that, and thanks for the bug report.
Does this new strategy pose problems for your language definition?
hope to hear from you soon
cheers
Lorenzo
--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134 (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net