help-source-highlight
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-source-highlight] Order of definitions in source-highlight 2.1


From: Lorenzo Bettini
Subject: Re: [Help-source-highlight] Order of definitions in source-highlight 2.10
Date: Mon, 25 Aug 2008 10:47:16 +0200
User-agent: Thunderbird 2.0.0.16 (X11/20080724)

address@hidden wrote:
I just upgraded source-highlight to 2.10 and I am noticing some strange behavior.

Suppose we have the file foo.lang:

symbol = "/"
comment start "//"

And the file test.foo:

// foo

The language definition is taken from the source-highlight manual, section 7.4: "Order of definitions". Note that the definitions are in the wrong order, according to the manual: "The first expression will always be matched first, and the second expression will never be matched." And yet:

$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo
<!-- Generator: GNU source-highlight 2.10
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span class="comment">// foo</span>
</tt></pre>

This was different with version 2.9:

$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo
<!-- Generator: GNU source-highlight 2.9
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span class="symbol">//</span><span class="normal"> foo</span>
</tt></pre>

What has changed between version 2.9 and 2.10?

Hi

yes, the strategy for regular expression matching has changed: before it used to build a huge regular expression with many alternatives; however, this would make the handling of things such as backreferences a real nightmare (since the number of backreference would have to be updated, and the number of backreferences is limited to 9), in particular it required to split regular expressions and the code was really buggy.

so in 2.10 I completely re-written the handling of regular expressions (http://www.gnu.org/software/src-highlite/source-highlight.html#fn-29); in particular, now each element has its own regular expression and the engine tests each expression and, as explained in 7.12:

"As hinted at the beginning of Language Definitions, source-highlight uses the definitions in the language definition file to internally create, on-the-fly, regular expressions that are used to highlight the tokens of an input file. Here we provide some internal details that are crucial to understand how to write language definition files correctly29.

First of all, each element definition, an highlighting rule is created by source-highlight (even if they correspond to the same language element); thus, each language definition file will correspond to a list of highlighting rules. For each line of the input file, source-highlight will try to match all these rules against the whole line (more formally, against the part of the line that has not been highlighted yet). It will not stop as soon as an highlighting rule matched, since there might be another rule that matches “better”.

The strategy used by source-highlight is to select the first rule that matches the longest part of the text with the smallest prefix (i.e., the initial part of the line that contains no language element). (Thus, as already noted in the previous sections, the order of language definitions is crucial.) Then, it will continue to search for another matching rule for the remaining part of the line."

So the case of / and // respects this rule, since // matches better than /.

Of course, you're right: the example of 7.4 does not work anymore and I have to update the documentation with a better example! Sorry about that, and thanks for the bug report.

Does this new strategy pose problems for your language definition?

hope to hear from you soon
cheers
        Lorenzo

--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net





reply via email to

[Prev in Thread] Current Thread [Next in Thread]