help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cleaning up memory in bison ?


From: David Fang
Subject: Re: Cleaning up memory in bison ?
Date: Fri, 21 Apr 2006 03:31:29 -0400 (EDT)

Hi,
        (My first post to the list, at a very late hour, be kind!)

        I have a way of doing this that involves a series of
reverse-engineering scripts, but I'll start by summarizing what I did at
in high-level.  Then we can get into details if need be.  This technique
for properly freeing C++-new allocated memory in the parser works soundly
with parsers generated by traditional yacc, byacc, and bison 1.35, 1.875,
2.0 so far.  (Been using for about 2 years.)

        Context: my YYSTYPE is a union of about a hundred different
pointer types, all of which are allocated using 'new' in C++.  (For added
efficiency, the most frequent union member types are even pool-allocated
with class-overridden new/delete operators.)  Flex just allocates (via
operator new) and constructs tokens and places them into yylval.  The
semantic actions in yyparse just build up syntax tree from these token.
I guarantee that all such heap-allocated 'tokens' and tree-nodes are
deleted if a parse error occurs.  (If the input is accepted all tokens are
built into a memory-manager syntax tree that automatically cleans up upon
destruction.) In C++, it is crucial to match new/delete, because malloc
doesn't construct/initialize, and free doesn't destroy members.  If you're
only in the C domain (as in the case of the original poster), then you
probably you will need the same concept if you have symbol types that need
to be freed recursively.


I provide a deallocating function:

/**
        This walks the parser's value/symbol stacks and calls
        the appropriate delete function by deducing which union member
        type corresponds to each symbol on the stack.
        This is called from our custom yyerror().
 */
static
void
yyfreestacks(const short* _yyss_, const short* _yyssp_,
                const YYSTYPE* _yyvs_, const YYSTYPE* _yyvsp_,
                const YYSTYPE _yylval_, const int _yychar_) {
        const short* s;
        const YYSTYPE* v;
        s=_yyss_+1;     // first entry is reserved
        v=_yyvs_+1;     // first entry is reserved
        for ( ; s <= _yyssp_ && v <= _yyvsp_; s++, v++) {
                if (v) {
                        yy_union_resolve_delete(*v, *(s-1), *s);
                        // herein lies the magic! described below...
                }
        }
        if (!lexer_at_eof()) {
                // custom-defined EOF detector checks lexer's buffer state
                // free the last token (if not EOF)
                yy_union_lookup_delete(_yylval_, _yychar_);
                // another looked-up deleting function
        }
}

/**
        Called upon parse error.
 */
static
void
yyerror(const char* msg) {
        // print a bunch of diagnostics here...
        // the free the value/symbol stacks
        yyfreestacks(yyss, yyssp, yyvs, yyvsp, yylval, yychar);
        // now we can throw an exception up to the caller of
        // yyparse() because we've cleaned up memory
}


Here are the guts of the deallocators.  Theses are actually all
automatically generated (from a template) by a chain of (don't laugh) awk
scripts whose purpose is to create tables that correlate symbols and
positions to union member types.  They take gram.yy, y.output as inputs
and produce lookup tables for the following functions.

/**
        \param u is the reference to a union on the symbol stack
        \param i is the previous state of the state transition
        \param j is the current state of the state transition.
                (i,j) describes the state transition from state i to j.
 */
void
yy_union_resolve_delete(const YYSTYPE& u, const short i, const short j) {
        const int k = yy_union_resolve_index(i,j);      // below...
        if (k >= 0)
                (*yy_union_delete[k])(u);
        // else cannot delete unknown type
}

/**
        \param u the union to deallocate.
        \param c the symbol ID of the last yylval.
 */
void
yy_union_lookup_delete(const YYSTYPE& u, const int c) {
        const int i = yy_union_lookup_index(c);         // below...
        if (i >= 0) (*yy_union_delete[i])(u);
}

/**
        Quick linked list structure of (state, type) pair.
        where state is j of an (i,j) state transition.
        the type_enum corresponds to a union_member type.
 */
typedef struct _yy_state_map_link_ yy_state_map_link;
struct _yy_state_map_link_ {
        int state;              /* state number to match */
        int type_enum;          /* enumerated type */
        const yy_state_map_link* next;
};


/**
        Given state transition (i,j), this returns an index corresponding
        to the deallocation function.
        There exists one deallocating function per union member type.
        The lookup tables in my quick and dirty implementation are
        statically allocated linked lists, so searching is slow in the
        worst case... but we don't care because this only occurs on syntax
        error.  The details herein are not important.
 */
static int
yy_union_resolve_index(const short i, const short j) {
const yy_state_map_link* iter = yysma[i];
/* sequentially compare state keys */
while (iter) {
        if(iter->state == j) break;
        iter = iter->next;
} /* end while */
if (iter && iter->state == j) {
        const int k = iter->type_enum;
        if (k >= 0) {
                assert(k < MAX_NUM_UNION_MEMBERS);
        } // else k < 0, signaling an error.
        return k;
} else {
        return -1;      // error, not found
}
}

/**
        Looks an enumerated map that matches the yylex's (int) return
        value to the index corresponding to the union member type.
 */
static int
yy_union_lookup_index(const int c) {
        if (c >= 0 && c < MAX_TOKEN_ENUMERATION) {
                const int i = token_to_type_enum_map[c]; // below...
                assert(i >= 0);
                assert(i < MAX_NUM_UNION_MEMBERS);
                return i;
        } else {
                return -1;
        }
}

/**
        This translates a symbol number into type number.
        The type enumeration is determined automatically by
        a script that reads the first section of the gram.yy file.
 */
static int
token_to_type_enum_map[MAX_NUM_UNION_MEMBERS] = { .... };

// An example of a snippet of auto-generated linked lists
// might look like this:

/************************** state 0 **************************/
static const yy_state_map_link yysml_0_0 = { 1, 19, NULL }; /* shift */
static const yy_state_map_link yysml_0_1 = { 2, 17, &yysml_0_0 }; /* goto */
static const yy_state_map_link yysml_0_2 = { 3, 20, &yysml_0_1 }; /* goto */
static const yy_state_map_link yysml_0_3 = { 4, 20, &yysml_0_2 }; /* goto */
static const yy_state_map_link yysml_0_4 = { 5, 19, &yysml_0_3 }; /* goto */

// goes on for hundreds of states...

The set of linked lists makes an array of linked lists where given
a state transition (i,j), i is the index into the array, and j is searched
in the selected linked list for a match.  Only valid state transitions
are considered/generated.

The start of the array declaration could look like:

static const yy_state_map_link* yysma[MAX_NUMBER_STATES] = {
        &yysml_0_4,
        NULL,           // state with only 'reduce' action
        ...
};

Finally, the array of deallocators looks like:

static void (*yy_union_delete[MAX_NUM_UNION_TYPES])(const YYSTYPE&) = {
        &yy_union_delete_member_type1,
        &yy_union_delete_member_type2,
        ...
};

where the deallocators have prototype:

static void
yy_union_delete_member_type1(const YYSTYPE& u) {
        if (u._member1)
                delete u._member1;      // C++-style
        // will call the appropriate delete, even if class-overridden
}

In summary, each union member on the parser's symbol stack is deallocated
through a function call that deduces the actual type of each union
member by looking at the state transitions.
with C++-style, delete-ing the appropriate union-member will also
call its destructors, thereby taking care of recursive deallocation.
(I would not want to write a free_me function for each union member
type by hand in C.)

I haven't included the crucial scripts in this post, but I hope I've
described the idea enough to be of some use.  Questions, comments, and
suggestions are welcome.

David



David Fang
Computer Systems Laboratory
Electrical & Computer Engineering
Cornell University
http://www.csl.cornell.edu/~fang/
        -- (2400 baud? Netscape 3.0?? lynx??? No problem!)


---------------------->8 snip 8<-----------------------
I am very new to bison and flex and wonder how to clean up pointers/memory
in bison properly.

Whenever I copied a string to the variable yylval.str in the flex file
like
this:
{WORD}         { yylval.str = strdup( yytext ); return WORD; }
I got a memory leak cause I did not delete the pointer copied by the
strdup()
function.
Here is the output of valgrind:

> valgrind --leak-check=full rules
...
...
==935== 3 bytes in 1 blocks are definitely lost in loss record 2 of 3
==935==    at 0x401B46D: malloc (vg_replace_malloc.c:149)
==935==    by 0x40D1B9F: strdup (in /lib/tls/libc-2.3.6.so)
==935==    by 0x8048976: yylex (in /RulezParser/flex_bison_test/rules)
==935==    by 0x804990E: yyparse (in /RulezParser/flex_bison_test/rules)
==935==    by 0x804A498: main (in /RulezParser/flex_bison_test/rules)
...


First I tried to solve this problem with the %destructor directive in
bison, which apparently did not work.
Then I solved the problem in the %action section in bison like this:

WORD EQUAL WORD { $$ = !strcmp( $1, $3 ); free($1); free($3); }

This seems to work cause valgrind does not show any memory leak anymore.
But will this work in case of an syntax error ?
And another question arises: What does the %destructor do and how does it
work ?


Can anyone with a bit of knowledge/experience shed some ligth on this
issue
?

thanks,
arno

---------------------->8 snip 8<-----------------------





reply via email to

[Prev in Thread] Current Thread [Next in Thread]