[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposed interface changes for "parse.error custom"
From: |
Adrian Vogelsgesang |
Subject: |
Re: Proposed interface changes for "parse.error custom" |
Date: |
Wed, 26 Feb 2020 22:58:26 +0000 |
User-agent: |
Microsoft-MacOutlook/10.10.12.200112 |
Hi Akim
regarding our discussion on yysyntax_error_arguments
--------------------------------------------------------------------------
> You're proposal is to expose only (i) an access to the lookahead symbol, and
> (ii) yyexpected_tokens, as opposed to exposing yysyntax_error_arguments that
> gives them both.
yes, that sums it up quite nicely.
> Why not, but where/when would you expect to use one and not the others?
Use case for lookahead token without expected tokens:
I would want the lookahead, but not the actual expected tokens, if I want to
report errors containing the actual lookahead but not the expected list.
E.g., if I am not using lookahead correction, my expected token list will be
incorrect. In such a case I would prefer to still display the actual token, but
without the list of expected tokens.
Use case for expected tokens, but without the lookahead token:
I am having a harder time to come up with a good use case here. But then again,
this is the use case which we already support: We support direct access to the
expected token list through yyexpected_tokens already in the current design.
Having thought about this a few more days, I found 3 more reasons why I dislike
`yysyntax_error_arguments`:
1. I would like to treat expected tokens and the lookahead value independently.
In my use case, the list of expected tokens will be filtered before being
displayed to the user. The lookahead token itself would stay unfiltered (see
https://lists.gnu.org/archive/html/bison-patches/2020-01/msg00001.html for the
description of my use case). There is nothing really stopping me from doing
this also with the current interface, but it feels a bit more cumbersome to me.
I guess I would personally end up misusing yysyntax_error to always only return
the lookahead token by using `yysyntax_error_arguments(&lookahead, 1)`, i.e. I
would always ask it for only the lookahead token and retrieve the list of
expected tokens independently using `yyexpected_tokens`.
2. As a new user of bison who wants to get the lookahead token, I would
probably not realize that `yysyntax_error_arguments` does what I am looking
for. A function named `yyget_lookahead` is much more self-describing. Yes,
bison belongs to the category of programs which are hard to use without
actually reading the documentation and I am sure that we would document
`yysyntax_error_arguments` as the supported way to get the lookahead token.
However, the reviewers of my code changes to our parser, or other people coming
across my commits later, might not have read the bison docs and have a hard
time to understand what exactly `yysyntax_error_arguments` does.
3. Naming this function `yysyntax_error_arguments` makes it harder to extend
the provided information in the future. We are introducing the `ParseContext`
concept to be able to potentially provide additional information in the future.
But what would we do about `yysyntax_error_arguments` as soon as we add that
additional information? Would we adjust `yysyntax_error_arguments` to also
return that additional information and thereby break the interface? Would we
keep `yysyntax_error_arguments` as is and introduce anadditional
`yysyntax_error_arguments_extended` which also includes the newly added
information? Would we only provide direct getters for the newly added
information?
regarding our discussion on memory management for yyexpected_tokens
--------------------------------------------------------------------------------------------------
> If your use case in incrementally proposing possible keywords, then you want
> them all, and a max of the possible number of tokens is available statically:
> YYNTOKENS.
You are right, I would want to retrieve the whole list of expected tokens.
> I don't see where this would be useful.
The code I am having in mind to do so would be
int onstack_tokens[10];
std::vector<int> heap_tokens;
int cnt_tokens = context.yyexpected_tokens(onstack_tokens, 10);
if (cnt_tokens > 10) {
heap_tokens.resize(cnt_tokens);
context.yyexpected_tokens(heap_tokens.data(), cnt_tokens);
}
int* tokens = (cnt_tokens <= 10) ? onstack_tokens : heap_tokens.data();
That way, we can use the stack allocation in the common case (<= 10 expected
tokens) and only fallback to the heap-allocation if we have too many expected
tokens. At least in our grammar, it is common that only very few keywords are
expected, except in a few places, where you can write almost everything. Now
that I think about this, I am not actually sure how well this observation
generalizes to other grammars, though. Preallocating YYNTOKENS (480 in our
case) on the stack would cost us ~2KB of stack space. Probably that would be
fine...
> The point here is efficiency
It's funny how both of us are having performance/efficiency as a goal but are
coming to a different conclusion :)
I was not seeing your point about "not spend time counting how many more there
are" on my own. Thanks for pointing that out! With that reasoning, I can
completely agree that the existing implementation is fine.
Cheers,
Adrian
On 26/02/2020, 19:30, "Akim Demaille" <address@hidden> wrote:
Hi Adrian!
> Le 24 févr. 2020 à 12:03, Adrian Vogelsgesang <address@hidden> a écrit :
>
> Hi,
>
> * We don’t have a getter for the lookahead token. It can only be
retrieved through yysyntax_error_arguments. I would propose an independent
getter for just the lookahead token.
Why not, but where/when would you expect to use one and not the others?
> * Should yysyntax_error_arguments really be part of the public interface?
I see that it is used for the builtin error reporting and for testing error
reporting in the test cases. Still I think that the use case of
yysyntax_error_arguments is rather narrow and only really fits the error
reporting style of bison. I would prefer to not expose yysyntax_error_arguments
as part of the supported interface and instead provide direct access to the
lookahead token.
You're proposal is to expose only (i) an access to the lookahead symbol,
and (ii) yyexpected_tokens, as opposed to exposing yysyntax_error_arguments
that gives them both.
Why not, indeed. I preferred the current approach because so far we don't
expose the list of expected tokens if we don't even have a lookahead. It's
kind of weird to report what you wanted when you cannot even say what you did
not like. And it's unintuitive that this can happen, so users might be
surprised by this.
> * yacc.c: yysyntax_error_arguments always write at index 0, even for
nullptr argument.
It should not, you are right!
> * Java has a different interface for yyexpected_tokens than the other
languages. It takes an additional yyoffset parameter. We need this, so
yysyntax_error_arguments can write at offset 1. If we got rid of
yysyntax_error_arguments, we could have the interface in sync across all
languages.
Right.
> * I would prefer if yyexpected_tokens would always return the number of
expected tokens instead of 0 in case the buffer is too small. I.e., if I pass
in a pre-allocated array of 5 elements but there are 10 expected tokens, I
would like still to get a return value of 10.
The point here is efficiency: it's useless to report the user 50 expected
tokens, so it does make sense to say 'gimme 5 at most' and not spend time
counting how many more there are. That's why we "break" when we go past the
threshold.
Yet returning 5 would not let you know where there are exactly 5, or more.
That's why I return 0 in that case.
> That way, I can always pass in a stack-allocated 5-element array as a
first try and only do a heap allocation if there are more than 5 expected
tokens.
I don't see where this would be useful.
If your use case in incrementally proposing possible keywords, then you
want them all, and a max of the possible number of tokens is available
statically: YYNTOKENS.
If your use case is error messages, you certainly want some limit, say 5,
and then you don't need to allocate on the heap.
So I don't think one should worry about heap-allocation here in the common
scenarios.