Re: [Help-source-highlight] Order of definitions in source-highlight 2.1

help-source-highlight

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-source-highlight] Order of definitions in source-highlight 2.1

From:	Lorenzo Bettini
Subject:	Re: [Help-source-highlight] Order of definitions in source-highlight 2.10
Date:	Fri, 05 Sep 2008 01:28:29 +0200
User-agent:	Thunderbird 2.0.0.16 (X11/20080724)

address@hidden wrote:

I just upgraded source-highlight to 2.10 and I am noticing some strangebehavior.
Suppose we have the file foo.lang:

symbol = "/"
comment start "//"

And the file test.foo:

// foo
The language definition is taken from the source-highlight manual,section 7.4: "Order of definitions". Note that the definitions are inthe wrong order, according to the manual: "The first expression willalways be matched first, and the second expression will never bematched." And yet:
$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo

<pre><tt>// foo
</tt></pre>

This was different with version 2.9:

$ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo

<pre><tt>// foo
</tt></pre>

What has changed between version 2.9 and 2.10?


Hi there

as I had already written in the previous email, the matching strategychanged between 2.9 and 2.10:

"The strategy used by source-highlight is to select the first rule thatmatches the longest part of the text with the smallest prefix (i.e., theinitial part of the line that contains no language element). (Thus, asalready noted in the previous sections, the order of languagedefinitions is crucial.)"

however, when working on the documentation, I actually realized thatthis strategy is too involved and a little bit confusing, not to mentionthat it has a lot of overhead, since it tests ALL the rules in a state.

Then, I realized that basically the rule that should be selected is theone with the smallest prefix, but we could stop testing rules as soon aswe find a rule that matches and whose prefix (i.e., the part of thestring before the matched one) contains only spaces (or it's empty). Ithink this is also the strategy used by standard regular expressionengines, or at least, this one seems to be enough for programming languages.


Thus, for instance, if I have

i = null;

if I match null as a keyword, its prefix is "i = " and I should not stoptesting other rules, since otherwise I would not test the symbol rule(that is defined later).


While, if I have

   if (exp)

as soon as I match "if" as a keyword, since its prefix is " ", I canstop testing other rules (this way, I don't even risk to match "if(exp)"as a function call (note that with the previous strategy this wouldmatch better since it matches more characters).

I think this is the right strategy and it brings the example in thedocumentation to work again as described.

I've uploaded a temporary version that uses this strategy (and it alsoperforms faster as expected) here:


http://gdn.dsi.unifi.it/~bettini/source-highlight-2.10.1.tar.gz

I'd really appreciate to get some feedback, especially do you think thatthis new strategy makes sense?


There's also a new test in the tests directory: test_string_stop.lang:

keyword = "if|class"

type = 'int'

comment delim "/*" "*/"

# thus this won't catch "/* */ /" as a regexp,
# since comment elem definition comes first
regexp = '/.*/.*/'

# this won't match if ( ) as a function,
# since keyword elem definition comes first
function = '([[:alpha:]]|_)[[:word:]]*[[:blank:]]*\(*[[:blank:]]*\)'

# the following order is conceptually wrong,
# since "//" won't be highlighted as a comment, but as two symbols
symbol = "/"
comment start "//"

which can be used with the input file test_string_stop.java, whichproduces the attached output, which is the one expected with the newstrategy.


cheers
        Lorenzo

--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

/* comment */ final /
/my/regexp/
  if ( ) {
    class;
    myfun ( );
  }
  int i;
  int ( );
// comment? or two symbols?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Help-source-highlight] Order of definitions in source-highlight 2.10, Lorenzo Bettini <=
- Re: [Help-source-highlight] Order of definitions in source-highlight 2.10, gnombat, 2008/09/05
 - Re: [Help-source-highlight] Order of definitions in source-highlight 2.10, Lorenzo Bettini, 2008/09/06
 - Re: [Help-source-highlight] Order of definitions in source-highlight 2.10, gnombat, 2008/09/06
 - Re: [Help-source-highlight] Order of definitions in source-highlight 2.10, Lorenzo Bettini, 2008/09/07

Prev by Date: Re: [Help-source-highlight] regexp in javascript.lang (again)
Next by Date: Re: [Help-source-highlight] Order of definitions in source-highlight 2.10
Previous by thread: [Help-source-highlight] regexp in javascript.lang (again)
Next by thread: Re: [Help-source-highlight] Order of definitions in source-highlight 2.10
Index(es):
- Date
- Thread