[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tokenizing
From: |
Stephen Leake |
Subject: |
Re: Tokenizing |
Date: |
Mon, 22 Sep 2014 08:15:51 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) |
Vladimir Kazanov <address@hidden> writes:
>> Ada mode uses text properties to store parse results; the tokenizer
>> results are part of that, but are not stored separately. I don't see
>> much point in separating the tokenizer from the parser; the tokenizer
>> results are not useful by themselves (at least, not in Ada mode).
>>
>
> First, this not quite right. Tokenization results can be used, for
> example, for granular syntax highlighting.
Ada requires semantic parsing, not just tokenizing, for syntax
highlighting (it's a complex language :).
Hmm, I'm not sure what you mean by "granualar" here; if you mean "less
than totally accurate", then you are right, we don't need a parser for
that. On the other hand, we don't need a tokenizer, either :).
In Ada, a parser is required to distinguish between these two instances
of 'return':
function Foo (...) return Bar
is begin
Baz := ...;
return Baz;
end Foo;
In the first instance, "Bar" is a type; in the second, "Baz" is a
variable. They should have different faces.
Finding the 'function' keyword from just token information is
non-trivial. The parser tags Bar as a type.
> Font Lock basically just
> uses regexps to catch something that looks like
> comments/keywords/whatever.
But that can be extended by arbitrary functions in
font-lock-add-keywords; Ada mode does that to use the parser information
(when available; it doesn't force a parse just for font-lock).
> Second, it not a tokenizer I want to build, there is a
> misunderstanding of sorts. It is a helper mode (similar to Font Lock,
> in a way) for keeping token lists up to date all the time, easy and
> fast. User code - the tokenizer itself - will just have to provide an
> interface to the mode (be restartable and supply required restart
> information in resulting tokens). The mode will use the information to
> avoid extra tokenizing.
Ok. Maybe I can use that, and have it run the parser whenever needed.
Just replace "token lists" with "some text properties" in the above; the
helper mode should not care if they are "tokenizer results" or "parser
results".
>> I have not noticed any problems with the text properties interface; in
>> particular, storing and retrieving text properties is fast compared to
>> parsing. Ada mode stores about two parse result text properties per
>> source line on average.
>
> I did not know about your mode - and parsers are sort of my hobby :-)
> I will definitely check it out, especially because it uses GLR(it
> really does?!), which can non-trivial to implement.
Yes, implementing GLR was complicated, and therefore fun :).
--
-- Stephe
- Re: Tokenizing, (continued)
- Re: Tokenizing, Stephen Leake, 2014/09/21
- Re: Tokenizing, Stefan Monnier, 2014/09/21
- Re: Tokenizing, Vladimir Kazanov, 2014/09/21
- Re: Tokenizing, Daniel Colascione, 2014/09/21
- Re: Tokenizing, Vladimir Kazanov, 2014/09/22
- Re: Tokenizing, Daniel Colascione, 2014/09/22
- Re: Tokenizing, Stephen Leake, 2014/09/22
- Re: Tokenizing, Daniel Colascione, 2014/09/22
- Re: Tokenizing,
Stephen Leake <=
- Re: Overlay mechanic improvements, Richard Stallman, 2014/09/20
- Re: Overlay mechanic improvements, Stephen Leake, 2014/09/20