[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GSoC] "Parse using AST" Final Report

From: Vishal Gupta
Subject: Re: [GSoC] "Parse using AST" Final Report
Date: Thu, 23 Aug 2018 19:05:12 +0530

Apology for spacing issues as this was my first time doing the
formatting in plain text mode.
On Thu, Aug 23, 2018 at 6:26 PM Vishal Gupta <address@hidden> wrote:
> Hello everyone,
> Google Summer of Code provided me the opportunity to work with Automake during
> the summers. Here is the final report on work done during the same.
>                 ___________________________________________
>                               [GSoC'18] Parse using
>                               Abstract Syntax Tree - Final Report
>                 ___________________________________________
> Branch:- experimental/gsoc/ast
> Commit History:- cf633fb61 to 5aebec2cc
> 1. Introduction
> ============
> The goal of the project was to generate an abstract syntax tree after parsing
> the input . It has separate lexer, parser and tree generation
> module so that new features can be added incrementally. For debugging purpose
> the AST generated is displayed as image using graph description language.
> The current implementation can parse the following type of statements:
> * Primaries : All the primaries are identified.
> * Multiline statements : Statements on multiple lines ending with backslash
>   and its combination with other type of statements
> * Conditional statements : Single or nested conditional statements
>   [ if-else-endif block ]
> * Subdirs directive : The parser parses the file in the said
>   subdirectory and generates the tree in that subdirectory.
> * Include directive : The parser recursively parses the included file and
>   merge the generated tree to the parent tree by serializing at child process
>   and deserializing at parent process.
> 2. Technical Description
> ===================
> * Lexer :- It converts the input file line by line into tokens using perl
>   regex. Whenever called by parser, it returns an array of tokens. Each token
>   has a name and if required, a value.
> * Bison Grammar :- The grammar of Automake is written in GNU Bison. It is used
>   to generate state description and state transition diagram. Writing in Bison
>   is easy and exact syntax can be captured. Adding new features and debugging
>   them becomes easy as no change is required in parser. The tool is only
>   required at maintainer end to generate the parsing table.
> * Parsing Table :- It is an array of hashes. Each array index indicate the
>   state and hash consist of key value pair (Key for token acceptable for that
>   state and value is the next transition state). For reduction of grammar and
>   generation of tree, a special 'reduce' key is used in Hash. It points to
>   array containing number of tokens to reduce and corresponding function to
>   call in Tree module. The start state is 0 and acceptance state is stored in
>   variable with value decided by converter.
> * Converter :- It takes as input the state description and state transition
>   file generated from Bison and outputs the parsing table in a format
>   understood by perl.
> * Parser and AST :- It parses the input file using tokens provided by lexer
>   and the parsing table provided by converter. LR parsing algorithm
>   similar to this [] is implemented.
>   At each reduction of the grammar, the node is generated for the tree
>   by calling its corresponding function in tree module specified in parsing
>   table. After successful parsing, the parser outputs the generated tree in
>   DOT format.
> * Graph Description Language and dot utility :- For visualization of abstract
>   syntax tree, it is printed in the graph description language. It provide an
>   easy interface to display directed and undirected graphs. As the size of
>   input file increases, the tree increases in width significantly as compared
>   to height as all the statements identified during parsing are attached to
>   single master node. Unflatten utility is used to fix this. At last, dot
>   utility is used to convert the graph to image file.
> * Testing :- Test cases for features implemented are provided in “t” directory
>   A simple script is executed which generates AST for the test files.
>   Currently visual inspection of tree is required for validation.
> 3. Future Work
> =============
> * Make rules :- Currently only basic make rules are identified by parser.
>   Grammar needs to be extended to handle complex make rules.
> * Using AST to generate :- Current project was to generate AST
>   from input file. This AST can be used to generate Converting
>   AST to the output file is another big task. It would require to implement
>   how different variables are handled, what to output for corresponding
>   statement and how to integrate with the current code base.
> 4. Conclusion
> ============
> Learnt and enjoyed a lot. Applying the compiler theory learnt in class to
> practical work with a language I was knowing only some basics was a wonderfull
> experience. It also helped me to understand and follow good coding practices.
> Most of the automake features are implemented. I will be working to fix the
> grammar to identify complex make rules. With that, AST can be generated for
> most of the files which are identified by Automake. As this is a new feature,
> in the current state it cannot be merged in codebase. When generation of
> from AST be implemented can it be merged. I would like to continue
> working with that part, but some guidance in that direction will be required.
> Sorry for delay in presenting the report as I was not well.
> ---
> Vishal Gupta

reply via email to

[Prev in Thread] Current Thread [Next in Thread]