Re: Proposed interface changes for "parse.error custom"

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed interface changes for "parse.error custom"

From:	Adrian Vogelsgesang
Subject:	Re: Proposed interface changes for "parse.error custom"
Date:	Wed, 26 Feb 2020 22:58:26 +0000
User-agent:	Microsoft-MacOutlook/10.10.12.200112

Hi Akim

regarding our discussion on yysyntax_error_arguments
--------------------------------------------------------------------------

> You're proposal is to expose only (i) an access to the lookahead symbol, and 
> (ii) yyexpected_tokens, as opposed to exposing yysyntax_error_arguments that 
> gives them both.
yes, that sums it up quite nicely.

> Why not, but where/when would you expect to use one and not the others?
Use case for lookahead token without expected tokens:
I would want the lookahead, but not the actual expected tokens, if I want to 
report errors containing the actual lookahead but not the expected list.
E.g., if I am not using lookahead correction, my expected token list will be 
incorrect. In such a case I would prefer to still display the actual token, but 
without the list of expected tokens.

Use case for expected tokens, but without the lookahead token:
I am having a harder time to come up with a good use case here. But then again, 
this is the use case which we already support: We support direct access to the 
expected token list through yyexpected_tokens already in the current design.

Having thought about this a few more days, I found 3 more reasons why I dislike 
`yysyntax_error_arguments`:

1. I would like to treat expected tokens and the lookahead value independently. 
In my use case, the list of expected tokens will be filtered before being 
displayed to the user. The lookahead token itself would stay unfiltered (see 
https://lists.gnu.org/archive/html/bison-patches/2020-01/msg00001.html for the 
description of my use case). There is nothing really stopping me from doing 
this also with the current interface, but it feels a bit more cumbersome to me. 
I guess I would personally end up misusing yysyntax_error to always only return 
the lookahead token by using `yysyntax_error_arguments(&lookahead, 1)`, i.e. I 
would always ask it for only the lookahead token and retrieve the list of 
expected tokens independently using `yyexpected_tokens`.

2. As a new user of bison who wants to get the lookahead token, I would 
probably not realize that `yysyntax_error_arguments` does what I am looking 
for. A function named `yyget_lookahead` is much more self-describing. Yes, 
bison belongs to the category of programs which are hard to use without 
actually reading the documentation and I am sure that we would document 
`yysyntax_error_arguments` as the supported way to get the lookahead token. 
However, the reviewers of my code changes to our parser, or other people coming 
across my commits later, might not have read the bison docs and have a hard 
time to understand what exactly `yysyntax_error_arguments` does.

3. Naming this function `yysyntax_error_arguments` makes it harder to extend 
the provided information in the future. We are introducing the `ParseContext` 
concept to be able to potentially provide additional information in the future. 
But what would we do about `yysyntax_error_arguments` as soon as we add that 
additional information? Would we adjust `yysyntax_error_arguments` to also 
return that additional information and thereby break the interface? Would we 
keep `yysyntax_error_arguments` as is and introduce anadditional 
`yysyntax_error_arguments_extended` which also includes the newly added 
information? Would we only provide direct getters for the newly added 
information?

regarding our discussion on memory management for yyexpected_tokens
--------------------------------------------------------------------------------------------------

> If your use case in incrementally proposing possible keywords, then you want 
> them all, and a max of the possible number of tokens is available statically: 
> YYNTOKENS.
You are right, I would want to retrieve the whole list of expected tokens.

> I don't see where this would be useful.

The code I am having in mind to do so would be
    int onstack_tokens[10];
    std::vector<int> heap_tokens;
    int cnt_tokens = context.yyexpected_tokens(onstack_tokens, 10);
    if (cnt_tokens > 10) {
       heap_tokens.resize(cnt_tokens);
       context.yyexpected_tokens(heap_tokens.data(), cnt_tokens);
    }
    int* tokens = (cnt_tokens <= 10) ? onstack_tokens : heap_tokens.data();

That way, we can use the stack allocation in the common case (<= 10 expected 
tokens) and only fallback to the heap-allocation if we have too many expected 
tokens. At least in our grammar, it is common that only very few keywords are 
expected, except in a few places, where you can write almost everything. Now 
that I think about this, I am not actually sure how well this observation 
generalizes to other grammars, though. Preallocating YYNTOKENS (480 in our 
case) on the stack would cost us ~2KB of stack space. Probably that would be 
fine...

> The point here is efficiency
It's funny how both of us are having performance/efficiency as a goal but are 
coming to a different conclusion :) 

I was not seeing your point about "not spend time counting how many more there 
are" on my own. Thanks for pointing that out! With that reasoning, I can 
completely agree that the existing implementation is fine.

Cheers,
Adrian

On 26/02/2020, 19:30, "Akim Demaille" <address@hidden> wrote:

    Hi Adrian!

    > Le 24 févr. 2020 à 12:03, Adrian Vogelsgesang <address@hidden> a écrit :
    > 
    > Hi,
    > 
    > * We don’t have a getter for the lookahead token. It can only be 
retrieved through yysyntax_error_arguments. I would propose an independent 
getter for just the lookahead token.

    Why not, but where/when would you expect to use one and not the others?

    > * Should yysyntax_error_arguments really be part of the public interface? 
I see that it is used for the builtin error reporting and for testing error 
reporting in the test cases. Still I think that the use case of 
yysyntax_error_arguments is rather narrow and only really fits the error 
reporting style of bison. I would prefer to not expose yysyntax_error_arguments 
as part of the supported interface and instead provide direct access to the 
lookahead token.

    You're proposal is to expose only (i) an access to the lookahead symbol, 
and (ii) yyexpected_tokens, as opposed to exposing yysyntax_error_arguments 
that gives them both.

    Why not, indeed.  I preferred the current approach because so far we don't 
expose the list of expected tokens if we don't even have a lookahead.  It's 
kind of weird to report what you wanted when you cannot even say what you did 
not like.  And it's unintuitive that this can happen, so users might be 
surprised by this.

    > * yacc.c: yysyntax_error_arguments always write at index 0, even for 
nullptr argument.

    It should not, you are right!

    > * Java has a different interface for yyexpected_tokens than the other 
languages. It takes an additional yyoffset parameter. We need this, so 
yysyntax_error_arguments can write at offset 1. If we got rid of 
yysyntax_error_arguments, we could have the interface in sync across all 
languages.

    Right.

    > * I would prefer if yyexpected_tokens would always return the number of 
expected tokens instead of 0 in case the buffer is too small. I.e., if I pass 
in a pre-allocated array of 5 elements but there are 10 expected tokens, I 
would like still to get a return value of 10.

    The point here is efficiency: it's useless to report the user 50 expected 
tokens, so it does make sense to say 'gimme 5 at most' and not spend time 
counting how many more there are.  That's why we "break" when we go past the 
threshold.

    Yet returning 5 would not let you know where there are exactly 5, or more.  
That's why I return 0 in that case.

    > That way, I can always pass in a stack-allocated 5-element array as a 
first try and only do a heap allocation if there are more than 5 expected 
tokens.

    I don't see where this would be useful.

    If your use case in incrementally proposing possible keywords, then you 
want them all, and a max of the possible number of tokens is available 
statically: YYNTOKENS.

    If your use case is error messages, you certainly want some limit, say 5, 
and then you don't need to allocate on the heap.

    So I don't think one should worry about heap-allocation here in the common 
scenarios.

[Prev in Thread]

Current Thread

[Next in Thread]

Proposed interface changes for "parse.error custom", Adrian Vogelsgesang, 2020/02/24
- Re: Proposed interface changes for "parse.error custom", Akim Demaille, 2020/02/26
  - Re: Proposed interface changes for "parse.error custom", Adrian Vogelsgesang <=
    - Re: Proposed interface changes for "parse.error custom", Akim Demaille, 2020/02/27
    - Re: Proposed interface changes for "parse.error custom", Akim Demaille, 2020/02/27

Prev by Date: bench.pl: clean up the dust
Next by Date: Re: Proposed interface changes for "parse.error custom"
Previous by thread: Re: Proposed interface changes for "parse.error custom"
Next by thread: Re: Proposed interface changes for "parse.error custom"
Index(es):
- Date
- Thread