[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Overlay mechanic improvements

From: Vladimir Kazanov
Subject: Re: Overlay mechanic improvements
Date: Sat, 20 Sep 2014 11:08:27 +0300

The problem here is that tokens are not just lexemes(token text). To
be general and granular enough one also has to remember dependencies
between tokens. So, a token is: ["lexeme", lookback, lookahead,
position]. Here

 - "lexeme" is the string itself;

 - "lookahead" is a number of chars *after* the lexeme used to define
the lexeme (in the simplest case - just a whitespace between symbol
names, i.e. a single char);

 - "lookback" a number of previous tokens whose lookaheads span over
current token's zone of responsibility (this one can be recalculated
on the fly)

 - "position" is just a token beginning/end

Thus, we can define "a token" as a sum of a lexeme and context
required to extract the lexeme. Whenever something happens within a
particular lexem+lookahead interval -> the token has to be reparsed
and n=lookback previous tokens also have to be fixed.

More on this can be found in this paper:
http://harmonia.cs.berkeley.edu/papers/twagner-lexing.pdf. Quite a
lengthy read, I must warn. What I like about it is that it *proves*
token stream consistency after fixing. My system will hopefully be
simplified compared to the original idea while keeping the proved

>      All I need
>     is an ability to save position pairs, the positions should survive text
>     insertion/deletion
> I see multiple meanings for that; could you clarify?

I mean something like relative positioning of markers/overlays:
changes in unrelated buffer parts should change positions accordingly.

>                        and there should be a way to find those pairs given a
>     buffer point.
> Text properties are designed to be preserved through copying of text.
> Overlays are not.  So it seems to me that you must use text properties.

Yes, properties do survive copying, but token context might change and
the whole thing will have to be reparsed anyway. We can take a piece
of comments and drop it into the middle of an expression. I don't care
whether the repositioned text was a part of a token or not, I just
need to know that something changed within a particular interval

> For each token, you put a text property 'token' onto the characters in
> the token.  The value of the property would say what token they are.
> The property would be eq for all the characters in one token.
> Then you can use 'next-single-char-property-change' and
> 'previous-single-char-property-change' to find the end and the
> beginning of the token.

I chose overlays mostly because they allow to control *intervals of
text*, and the intervals can overlap. For example, I can add a
modification-hook to an overlay - and it won't be called for every
character, just for an interval(-s) overlapping. Citing the "Special
properties" documentation page for text properties: "you can't predict
how many times the function will be called". One can imagine a
workaround for this, but it would be just too cumbersome.

> If you run into any difficulties using the existing interfaces
> for text properties, we should improve the interfaces to make your
> program easier to write.

Should I just do that or try both (improving overlays on the way)?
After all overlays do have to be fixed anyway.

Yours sincerely,

Vladimir Kazanov

С уважением,

Владимир Казанов

reply via email to

[Prev in Thread] Current Thread [Next in Thread]