[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] DFA rewrite

From: Evan Berggren Daniel
Subject: Re: [gnugo-devel] DFA rewrite
Date: Mon, 2 Sep 2002 14:56:51 -0400 (EDT)

> Yes, the DFA matcher can be implemented more effectively with a
> global scan !!
> It was more easy to implement an "anchor based" DFA pattern matcher
> to keep some compabibility with the old pattern matcher.
> Implementing the global DFA matcher is lot of work
> but i am sure it can be done
> (And i would be happy to help you as much as possible).

Thanks.  I'll probably need it ;)

> Interface:
> ----------
> The DFA matcher should be more separated from the rest of gnugo
> let me propose you an interface:
> A "match" is a 3-uplet (pattern_id, position, transformation).
> The main objet manipulated is a "match set" whih is a set of such
> 3-uplets.

Sounds good.

> The matcher is a function
> match_set_t dfa_matcher(dfal_t dfa_list, match_set_t goal, board_t
> board)
> which takes as argument a dfa list, a subset goal and a board position
> and return the set of matches that are in goal.

Why do we have the subset?  I can see how it would be useful, but how
would a precomputed full board pattern matcher make effective use of it?
I would think you would need a separate state machine for each possible
subset.  Besides, the point of the full board match is constant running
time across all possible pattern sets (at cost of increased mem usage).

> Any other work like constraints solving should be
> considered separately.
> The old goal system should be precomputed by filtering
> the pattern sets BEFORE reading.
> The use of match sets allows us to implement
> incremental pattern matching.

Again, this should be fast enough incremental matches are a small
improvement.  Or did you mean do incremental matches with the current DFA
system, or some new variant that also checks each board position?

> Size consideration:
> -------------------
> 1) The actual version use a special symbol for "OUT BOARD" and loose
> a lot of memory (and time) in scanning the OUT BOARD positions.
> A global scanner would not need that.
> 2) The size of the stucture may be controlled by using
> a list of DFAs and a constant MAX_DFA_SIZE to control the size of each
> DFA.
> If DFA_MAX_SIZE is big then only one huge dfa will be produced,
> if DFA_MAX_SIZE is very small then one small DFA will be built for each
> pattern.
> The principle:
> --------------
> 1) in mkpat: Build one DFA for each possible match of each pattern.
> +-------------+
> |+++......+++.|
> |++++.....++++|
> |++.......++..|
> |...+........+|
> |.............|
> |........++...|
> |.......+++...|
> |........++...|
> |......+......|
> |.............|
>      ...
> 2) use synchronized product to combine this DFAs into bigger ones.
> We may combine DFA's by pattern_id, position or transformation.
> The easier is by pattern_id but position is also a good criterion.
> > So, my suggestion is to build the DFA to match against the entire board,
> > using a much larger state table but a potential speed increase of a factor
> > of ~30 (the number of squares in the largest pattern).  I think this trade
> > off is worth it.
> >
> There are a lot of patterns with board constraint this reduce
> the number of possible matches for a given pattern.

This is true.  I think it will help a lot.

> > The only worry would be that state generation would be slow, but this is
> > not a problem.  For testing compiles (eg pattern database tuning), the old
> > DFA could be used, and for release compiles long compile times are ok.
> Incremental DFA compilation is also a solution:
> store the dfas into binary files and only rebuild
> at need.
> >
> > I would like to undertake this if it seems like a useful and workable
> > idea.  I can describe my scheme for generating a compact state table in
> > more detail if needed.
> Ok give your description :)

OK, so this may or may not be a great idea.  My knowledge of DFAs is
mostly from hand tuned small ones in assembly class.  They were fast
though ;)

OK, so my thinking is that folding together has the disadvantage for large
state machines that you have, at least temporarily, memory usage
proportional to the product of the number of states in each of the two
machines, which could be unacceptably large during compilation.

So, we instead build the entire state machine, one state at a time.  Start
with state 1, the empty state.  Create 3 states corresponding to whether
the next position is black, white, or empty.  Check for each of those (and
store temporarily) what partial matches it corresponds to.  Store as part
of the machine any final matches it corresponds to.  Check the hash table
of partial matches for anything that is the same partial state; if there
is such, we just duplicated a state, throw the new one away.  You've now
created 3 new states; stick them at the end of the state machine and move
to the next state (state 2, which you just created, create 5, 6, 7, and
then do state 3).

Of course, this mechanism precludes incremental compiles, which is not
particularly helpful.

I don't know much about the product state machine generation method, maybe
there's an optimization there I haven't thought of to reduce memory usage.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]