gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] Improvement of the hash table in GNU go


From: Inge Wallin
Subject: Re: [gnugo-devel] Improvement of the hash table in GNU go
Date: Wed, 5 Feb 2003 20:29:53 +0100

Gunnar wrote:
> Inge wrote:
> > 1. We have to split the memory available to us into an area with
> >    positions and another one with read results.  One of these will run
> >    out before the other one does, which will waste memory.  We can't
> >    know in advance which will actually run out first.
> 
> This is probably not a very big waste. It just looks ugly.

I think you are right.  The waste is probably pretty small.

> > 2. The pointers that we use to link the read results together waste
> >    memory.
> 
> We still need some solution to handle table collisions, where linked
> lists is one possibility. The new scheme allows more options though.

A standard transposition table as it is implemented in, say, a chess
program handles collisions by ignoring them (sort of).  That is, if we
have a collision we decide which of the two (the old one or the new
one) that can stay and which have to go.  We don't bother to store
both since the one that gets thrown out will be inexpensive to
recalculate anyhow.  The reason for this is that the children of the
node will most likely still be in the table.

 Many criteria for chosing one node or the other have been proposed
and there are also more advanced solutions like two-level entries.
Basically this is a solved problem.  For details I refer to a PhD
thesis by a guy named Breuker.  It can be found on the net by Google
or looking at the web site of the University of Maastricht.

I plan to implement the replacement scheme that he calls TWO-BIG1.  It
is a scheme where every cache node has two entries: 

1. The one with the biggest amount of work invested in it.
2. The most recent one.

The work is counted in number of nodes that were investigated to get
the result in the node.  In our case, we can immediately use the same
measurement and for owl and semeai nodes, we just add the tactical
reading nodes that are used to get the result of this owl/semeai node
and the ones leading up to it in the owl search.

When we get a hash collision and the node is full, i.e. both entries
are filled, we compare the new one with the first one.  If the amount
of work needed to recalculate the new entry is bigger, we replace the
first entry, otherwise we replace the second entry.  This means that
the new entry becomes the new "most recent" one.

Breuker did a lot of experiments on transposition tables, and this
scheme is the one that lead to the most savings.

> > 4. To remedy the lack of a replacement scheme we currently have to
> >    call hashtable_partially_clear() when the table gets full.  This
> >    function clears the hash table of "inexpensive" entries (tactical
> >    read results) and keeps the expensive ones - owl reading and (I
> >    think) semeai reading.  However, in the process we throw away a lot
> >    of valuable information.  There is a also a problem with "open
> >    nodes", that are nodes that represent reading that is being
> >    performed at the time the cleaning is run.  These open nodes
> >    complicate things.
> 
> We would still need to be able to handle open nodes.

Actually, we wouldn't.  We wouldn't have to "reserve space", as it
were, for it in advance.  

> > Take the current hash value for the position and xor it with
> > attack_table[A1] (pardon the pseudo code here).  The value that we get
> > can be immediately used to look up the entry in the hash table and see
> > if it is there.  We no longer have to split things into positions and
> > read results.
> 
> Immediately used? You would still need to map it to the size of the
> actual table and handle collisions with some scheme.

Of course, but this could be a simple remainder calculation on the
size of the hash table.  Regarding collisions, se above.

> I think the main attraction of this scheme is that it can make the
> code simpler, and possibly faster. Better use of memory is certainly
> nice, but generally speaking the size of the cache doesn't seem to be
> very critical at all currently.

You might be right there.  At some point the whole search trees can be
stored in memory, after which bigger tables buy us nothing.

> My intuition is that this scheme wouldn't significantly increase the
> risk for collisions, but I wouldn't mind being convinced by some
> actual calculations.

I have to refer back to Breuker here.  The only thing that would lead
to an increased risk for collisions is that we might be able to use
more entries in the table.  This increase is very small, and 64 bits
will still be very much enough.

> > (To avoid the problem of "open nodes" we can do this _before_ we start
> > a new tactical reading tree, once the cache is 95% full.)
> 
> I don't believe in this idea. For some cache size and some reading
> complexity (think high levels) the cache will overflow anyway and if
> we can't clear it on the fly the performance hit will be severe. The
> partial clearing scheme for the current cache was a major improvement,
> practically eliminating all cache related problems we were seeing at
> the time.

The beauty of the new scheme is that we don't have to clear the whole
table in one sweep.  We can do it on an entry by entry basis and get a
much better result.

     -Inge




reply via email to

[Prev in Thread] Current Thread [Next in Thread]