Re: Tokenizing

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tokenizing

From:	Stephen Leake
Subject:	Re: Tokenizing
Date:	Mon, 22 Sep 2014 08:15:51 -0500
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt)

Vladimir Kazanov <address@hidden> writes:

>> Ada mode uses text properties to store parse results; the tokenizer
>> results are part of that, but are not stored separately. I don't see
>> much point in separating the tokenizer from the parser; the tokenizer
>> results are not useful by themselves (at least, not in Ada mode).
>>
>
> First, this not quite right. Tokenization results can be used, for
> example, for granular syntax highlighting. 

Ada requires semantic parsing, not just tokenizing, for syntax
highlighting (it's a complex language :).

Hmm, I'm not sure what you mean by "granualar" here; if you mean "less
than totally accurate", then you are right, we don't need a parser for
that. On the other hand, we don't need a tokenizer, either :).

In Ada, a parser is required to distinguish between these two instances
of 'return':

function Foo (...) return Bar
is begin
  Baz := ...;
  return Baz;
end Foo;

In the first instance, "Bar" is a type; in the second, "Baz" is a
variable. They should have different faces.

Finding the 'function' keyword from just token information is
non-trivial. The parser tags Bar as a type.

> Font Lock basically just
> uses regexps to catch something that looks like
> comments/keywords/whatever. 

But that can be extended by arbitrary functions in
font-lock-add-keywords; Ada mode does that to use the parser information
(when available; it doesn't force a parse just for font-lock).

> Second, it not a tokenizer I want to build, there is a
> misunderstanding of sorts. It is a helper mode (similar to Font Lock,
> in a way) for keeping token lists up to date all the time, easy and
> fast. User code - the tokenizer itself - will just have to provide an
> interface to the mode (be restartable and supply required restart
> information in resulting tokens). The mode will use the information to
> avoid extra tokenizing.

Ok. Maybe I can use that, and have it run the parser whenever needed.
Just replace "token lists" with "some text properties" in the above; the
helper mode should not care if they are "tokenizer results" or "parser
results".

>> I have not noticed any problems with the text properties interface; in
>> particular, storing and retrieving text properties is fast compared to
>> parsing. Ada mode stores about two parse result text properties per
>> source line on average.
>
> I did not know about your mode - and parsers are sort of my hobby :-)
> I will definitely check it out, especially because it uses GLR(it
> really does?!), which can non-trivial to implement.

Yes, implementing GLR was complicated, and therefore fun :).

-- 
-- Stephe

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Tokenizing, (continued)

Prev by Date: Re: [Emacs-diffs] trunk r117912: Minor improvements to new stack-allocated Lisp objects.
Next by Date: Re: Tokenizing
Previous by thread: Re: Tokenizing
Next by thread: Re: Overlay mechanic improvements
Index(es):
- Date
- Thread