help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Syntax error messages


From: Christian Schoenebeck
Subject: Re: Syntax error messages
Date: Fri, 01 Oct 2021 23:30:39 +0200

On Freitag, 1. Oktober 2021 09:37:52 CEST Hans Åberg wrote:
> > On 28 Sep 2021, at 14:10, Christian Schoenebeck
> > <schoenebeck@crudebyte.com> wrote:> 
> > On Montag, 27. September 2021 22:07:33 CEST Hans Åberg wrote:
> >>>> In order to generate better syntax error messages writing out the input
> >>>> line with the error and a line with a marker underneath, I thought of
> >>>> checking how Bison does it, but I could not find the place in its
> >>>> sources. —Specifically, a suggestion is to tweak YY_INPUT in the lexer
> >>>> to buffer one input line at a time, but Bison does not seem to do
> >>>> that.>
> >>> 
> >>> No, I keep track of the byte offset in the file, and print from the
> >>> file,
> >>> which I reopen to quote the source.
> >> 
> >> OK. I thought of this method, but then it does not work with streams.
> > 
> > In the past at least, builtin location support did not work well for me.
> > So
> > I'm usually overriding location data type and behaviour with custom type
> > declaration, plus implementation on lexer side.
> > 
> > I also prefer this data type presentation:
> > 
> > // custom Bison location type to support raw byte positions
> > struct _YYLTYPE {
> > 
> >    int first_line;
> >    int first_column;
> >    int last_line;
> >    int last_column;
> >    int first_byte;
> >    int length_bytes;
> > 
> > };
> > #define YYLTYPE _YYLTYPE
> > #define YYLTYPE_IS_DECLARED 1
> > 
> > // override Bison's default location passing to support raw byte positions
> > #define YYLLOC_DEFAULT(Cur, Rhs, N)                         \
> > do                                                          \
> > 
> >  if (N)                                                    \
> >  
> >    {                                                       \
> >    
> >      (Cur).first_line   = YYRHSLOC(Rhs, 1).first_line;     \
> >      (Cur).first_column = YYRHSLOC(Rhs, 1).first_column;   \
> >      (Cur).last_line    = YYRHSLOC(Rhs, N).last_line;      \
> >      (Cur).last_column  = YYRHSLOC(Rhs, N).last_column;    \
> >      (Cur).first_byte   = YYRHSLOC(Rhs, 1).first_byte;     \
> >      (Cur).length_bytes = (YYRHSLOC(Rhs, N).first_byte  -  \
> >      
> >                            YYRHSLOC(Rhs, 1).first_byte) +  \
> >                            YYRHSLOC(Rhs, N).length_bytes;  \
> >    
> >    }                                                       \
> >  
> >  else                                                      \
> >  
> >    {                                                       \
> >    
> >      (Cur).first_line   = (Cur).last_line   =              \
> >      
> >        YYRHSLOC(Rhs, 0).last_line;                         \
> >      
> >      (Cur).first_column = (Cur).last_column =              \
> >      
> >        YYRHSLOC(Rhs, 0).last_column;                       \
> >      
> >      (Cur).first_byte   = YYRHSLOC(Rhs, 0).first_byte;     \
> >      (Cur).length_bytes = YYRHSLOC(Rhs, 0).length_bytes;   \
> >    
> >    }                                                       \
> > 
> > while (0)
> > 
> > Because sometimes you need high level column & line span, and sometimes
> > you
> > rather need low level raw byte position & byte length in the input data
> > stream.
> 
> For the purpose of writing out the line in the error messages, this method
> (using C++) did not work out well, because I have two parsers, one for the
> language and one for directives, and it turns out to be difficult to pass
> the location information back to the top parser.
> 
> So instead, in addition to the input stream stack, I added two, for the
> current stream position, and the current stream line position. Because of
> the lexer buffering, they are computed in the lexer. These are properties
> attached to the input streams then, not the parser locations.
> 
> In the Bison type, I use line number and for columns the number of UTF-8
> characters. An ASCII caret marking the error is surprisingly accurate even
> in the presence of non-ASCII characters. But perhaps one should have a
> method to mark it on the line itself, not underneath.

Hmm, those two parsers run independently from each other, or do you rather 
mean you have coupled them in a way that they cross-influence their behaviour 
*while* they are still running?

So far I have not encountered any restriction with my location approach. I'm 
using it for all kinds of things like, of course warnings/errors on the CLI, 
highlighting of the same in code editors, but also for code refactoring stuff. 
The latter only works well with a full language aware parser, unlike those 
typical RegEx hacks.

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]