lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GLISS] Existing syntax abominations


From: Janek Warchoł
Subject: Re: [GLISS] Existing syntax abominations
Date: Sun, 23 Sep 2012 09:37:55 +0200

On Sat, Sep 22, 2012 at 6:13 AM, David Kastrup <address@hidden> wrote:
> Janek Warchoł <address@hidden> writes:
>> After that, the parser's job is to group these 'words' into meaningful
>> 'sentences'.  For example,
>> c4 g \f d8-.
>> becomes
>> c4
>> g \f
>> d8-.
>> (i.e., all things that go with the pitch - a duration, articulations
>> etc - are merged together).
>
> The "merging" is hierarchical: - and . are merged to -., d and 8 are
> merged to d8, then d8 and -. are merged to d8-. and so on.  In fact, the
> whole input is finally merged into "start_symbol", and then the parser
> is done.

Indeed, that makes more sense than what i wrote :)

>> The problem is that sometimes it's impossible to tell what something
>> is without looking at next thing.  For example, when reading this
>> \markup " \ bla"
>> letter-by-letter, Lily sees
>> \  <= a beginning of a command
>> m  <= first letter of the command name
>> a  <= second letter of the command name
>> r   etc.
>> k
>> u
>> p
>>    <= whitespace - this means command name ended
>> "  <= beginning of a string
>>    <= space in the string
>> \  <= another character in the string
>>
>> b
>> l
>> a
>> "  <= end of the string.
>>
>> That was easy.  Now, take this:
>
> Bad example: as far as the _parser_ is concerned, a string is just a
> single entity.

well, i thought that lexer uses lookahead too - my intent was to show
what happens when lexer processes " \ bla".
I guess i mixed parser and lexer in this example.

> That's one reason quoted strings can contain spaces: the
> lexer mever passes them as a _kind_ of token by themselves, but it _can_
> pass them inside of the _value_ of a token of kind STRING.

Hmm.  Despite the fact that things don't happen in lily the way i've
shown, did i give a good example of the idea of lookahead?

> Lookahead is needed in
> cases like detecting the end of a music event.  A music event can be all
> of the following:
>
> c c'' c''8 c''8-. c''8-.-^  c
>
> How do we recognize when the music event ends?  By taking a look at the
> _next_ token and seeing whether we can make it part of the current music
> event.  So the decision what the current music event is depends on what
> appears next in the input.

Ah, that's indeed a better example. (James, is it clear?)

> Usually, something like { .... } does not require lookahead to form
> units since there is a closing delimiter.  Unfortunately, { ... } is not
> a complete unit until we haven't checked that no \addlyrics is trailing
> it, which _still_ can become part of the expression.

Indeed, this kind of defeats the purpose of using something called a
*closing delimiter*.

>> Lookahead means that before deciding what current letter in input
>> means, we look at the next one.
>
> Not "letter", but "token".

ok

>> So, everytime Lily sees a backslash inside a string (inside " "), she
>> looks at the next letter in input to know whether the backslash is
>> just another char or has a special meaning.
>
> The lexer does not really work with "lookahead" as a rule: it can make
> more complex decisions (we take some pains to avoid this "backing up"
> for performance reasons, but it is not an inherent restriction).

ah, i guess this answers my previous question.

>> I'm not sure what lexer modes are, but i suppose that it's about
>> different rules in different contexts.  For example, when you're
>> inside a string you have to do a lookahead when you encounter a
>> backslash, but you don't have to do this when you're not inside
>> string.
>
> Strings are internal to the lexer: the parser never gets to see or
> influence string start and end.  There are other modes like lyricmode,
> markupmode, musicmode, chordmode and so on in which the tokens are being
> formed according to different rules.

Ah, so i mixed parser and lexer again.  But the "different rules in
different contexts" part holds true :)

>>> vI = \relative c'' { \clef "treble" \repeat unfold 40 g4 }
>>> \addQuote vIQuote { \vI }
>>
>> LilyPond says "i don't know what a \vl is.  \vl looks like a string,
>> and i don't want a string here"
>
> No, it does not look like a string.  The lexer sees \vl, recognizes it
> as a command and looks up its meaning.  It has no meaning, so it
> complains, and to pass anything at all to the parser, it passes the
> thing as a STRING to the parser, in the hope that this backslash might
> just have been part of something intended as a word.  It wasn't, and so
> the parser is the next one to complain that it has no idea what to do
> with a STRING in this context.

Ah, ok.  Anyway, the point is that Lily doesn't know what \vl is yet.

>>> Huh?  Why is \vI undefined at the time \addQuote is called?  Now since
>>> \addQuote is called in the lexer in this LilyPond version,
>>
>> David's experimental change resulted in \addQuote being called and
>> "calculated" during lexer phase.  This didn't happen before.
>
> That is not the actual problem.  The problem is that it is being called
> while the assignment has not yet been completed.

Oh, yes.  I didn't mean to say that "\addQuote being called and
"calculated" during lexer phase is the actual problem".  I only stated
what is the difference in behaviour.

> Previously, music
> functions are calculated in the parser, so the parser would have looked
> at the next token MUSIC_FUNCTION (for \addQuote) and would have decided
> that it does not match ADDLYRICS, then it would have completed the
> assignment with MUSIC_FUNCTION as the lookahead, and only _then_ would
> have continued with the following music expression.

aha.

>>> [snip long quote]

I don't know why you didn't delete this quote. please remove long
quotes that you don't directly reply to.

>> I'm not sure about mode-switching commands.
>> But generally, having to do excessive lookahead is bad.  You prefer to
>> know what's happening without looking ahead.
>
> Well, our syntax can't get along without lookahead.  But we have
> different modes, like lyrics mode, music mode, markup mode etc in which
> tokens are recognized differently.  It is the parser's job to switch
> between those modes, and if it does this decision based on lookahead,
> the lookahead is still recognized in the previous mode and can't be
> reinterpreted in its "proper" mode.

so, "avoid lookahead if possible, especially when modes change.", right?

> This is actually the reason I recently made recognition of commands and
> strings the same in the various modes: previously line-width was a
> single lexical unit in INITIAL mode (which is used inside of context
> definitions and output definitions), but was three units, line - width
> in most other modes.  Now if you had music interspersed in INITIAL mode,
> this might have looked like
> { ... } line-width = ...
> and since } needed a lookahead token to be complete, the lookahead
> token, still scanned in music mode, would have been just line, and there
> would have been no way to get to the single STRING line-width later.

that was really messy then.  Good that you've fixed it.

thanks for the explanations,
Janek



reply via email to

[Prev in Thread] Current Thread [Next in Thread]