Re: Tokenizing

From: Vladimir Kazanov
Subject: Re: Tokenizing
Date: Sun, 21 Sep 2014 21:55:46 +0300

> I don't normally edit 7000 line files, so the Ada mode parsing delay is
> not noticeable to me, so I prefer the current Ada mode approach of not
> using the idle timer to trigger a parse. But it could be a user option.

I will look into that. Although the main idea is to *keep the token
list consistent* most of the time. There will definitely be
customization possibilities.

> Ada mode uses text properties to store parse results; the tokenizer
> results are part of that, but are not stored separately. I don't see
> much point in separating the tokenizer from the parser; the tokenizer
> results are not useful by themselves (at least, not in Ada mode).

First, this not quite right. Tokenization results can be used, for
example, for granular syntax highlighting. Font Lock basically just
uses regexps to catch something that looks like
comments/keywords/whatever. Tokenizer already *knows* for sure what it
found. And you don't have to build a full parser to use the results.

Second, it not a tokenizer I want to build, there is a
misunderstanding of sorts. It is a helper mode (similar to Font Lock,
in a way) for keeping token lists up to date all the time, easy and
fast. User code - the tokenizer itself - will just have to provide an
interface to the mode (be restartable and supply required restart
information in resulting tokens). The mode will use the information to
avoid extra tokenizing.

> I have not noticed any problems with the text properties interface; in
> particular, storing and retrieving text properties is fast compared to
> parsing. Ada mode stores about two parse result text properties per
> source line on average.

I did not know about your mode - and parsers are sort of my hobby :-)
I will definitely check it out, especially because it uses GLR(it
really does?!), which can non-trivial to implement.

Yours sincerely,

Vladimir Kazanov

С уважением,

Владимир Казанов

