gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnugo-devel] Discussion about the new cache implementation


From: Inge Wallin
Subject: [gnugo-devel] Discussion about the new cache implementation
Date: Wed, 23 Jul 2003 06:44:50 -0400 (EDT)

Ok, I am back from my vacation.  I would like to summarize the
comments and see if we agree about what needs to be done.

Heikki wrote:
> The regression differences worry me a bit.  You could add a verify-mode
> that always returns "no result", when asked for a cached result, and on
> storing checks the new result against the stored one (if any). If ever
> attack(A1) on the same position gets different values, we know something
> is wrong. Possibly a hash collision (how ever unlikely), or a differing
> result due to hitting a node limit.  These should be easy to verify.

The regressions don't worry me very much.  I think they are just a
side effect of the fact that slightly different nodes get tried in the
reading process.  See also the discussion about the 1% - 99% ratio
between owl/semeai/connection nodes and tactical reading nodes below.

The idea for checking the cached results is interesting.

Arend wrote:
> I think you made it even harder than necessary: You mixed in two changes
> that are completely unrelated:
> 1. Change of the type of the hash values
> 2. Reorganize the cache table.
>
> No. 1 is debatable at least (it takes away the possibilty to easily
> change the number of bits for the hash values if I read your code
> correctly), so maybe a patch to just do no. 2 would be better.
> 
> At least, before we hardwire the hash value to 64 bit, we should do
> some estimates that we won't make hash collisions too likely even once
> we, say, increase the number of reading nodes by a factor of 10.

It wasn't any harder.  The new type of the hash value is extremely
simple to handle.  The only reason for keeping the old type is that
maybe 64 bit ints aren't supported by some compilers.  I don't buy
that we might want to extend to 96 bits someday.  64 bits are plenty,
and are all that we will ever need.  And no, this is not a case of
"Who would ever need more than 640 KB internal memory".  I did the
calculation that you ask for in a mail some time ago, and you can dig
it up if you want the details.

However, it might be good to use the old hash value for other reasons,
namely the fac that some compilers might not support 64 bit ints.

> >    I hope that some expert in autoconfig can device a test for stdint
> >    so that I can use it if it works and implement another way if there
> >    are only 32 bit integers on the target system.  Arend?
> 
> Oh dear, how did I get myself into that reputation? I think there is
> nothing better than what Wolfgang Manner suggested (or s.th. similar).

That was not exactly autoconf ready...  :-)

> >    Currently I allocate 1% of the memory to the old hash table and 99%
> >    of it for the new one.
> (...)
> >    I think that owl nodes are more important to save than tactical
> >    reading nodes so instead of 0.2% I allocate 1%, and that seems to
> >    work out pretty well.
> 
> You need to allocate more than 1% for the other caches. They also store the
> connection reading results. (I also think that that's where the
> PASSes/FAILs are coming from, as connectino reading also has a node
> limit.)

You might be right.  I could experiment with this to see if it makes
any difference if I use 2% or even 5% for the pattern based reading.
But I don't think it is necessary.  The old implementation should be
taken out instead and we should use the new one everywhere.

> While I really like your patch, I am against putting it into 3.4. We are
> simply too close to release.

I agree.

> > +  /* Return data.  Only set the result if remaining depth in the table
> > +   * is big enough to be trusted.  The move can always be used for move
> > +   * ordering if nothing else.
> > +   */
> > +  if ((unsigned) remaining_depth <= hn_get_remaining_depth(*node)) {
> > +    if (result)
> > +      *result = hn_get_result1(*node);
> 
> That's a subtle change in behaviour that also needs discussion
> (the old one only set the result if "==" in the above equation). I am not
> against it, but one should be aware that it makes debugging more
> difficult.

I didn't think of that.  Please discuss away, and let me have the
outcome.  :-)

> > +  if (move)
> > +    *move = hn_get_move(*node);
> That's a start towards iterative deepening, I suppose? (Yeah, yeah!)

Er, no.  This is not a start towards iterative deepening per se,
although I have ideas in that direction as well.  I think the new
transposition table will make it easier to do iterative deepening, but
the patch doesn't contain anything in that direction as it is now.
But I understand how the comment could be taken that way, since it is
copied from a program that *does* use iterative deepening.  

> When you described your plans, you wanted to store the "most expensive"
> (in terms of reading nodes) instead of "deepest". Any motivation for
> that change? Also, I find the name "deepest" a bit confusing, because
> you mean the entry where the remaining depth is highest, i.e. more
> likely a node at very shallow depth in the tree.

You are right.  The reason for the change is that I just copied some
code that I already have.  This was a way for me to get something
working as quickly as possible.  And I think it worked pretty well
as I got it all working on a single day.  :-) 

But that is not the whole truth.  If you look in cache.h, you can see
the following comment:

 * The data field packs into 32 bits the following
 * fields:
 *
 *   RESERVED       :  5 bits
 *   remaining_depth:  5 bits (depth - stackp)
 *   result1        :  4 bits
 *   result2        :  4 bits
 *   move           : 10 bits
 *   flags          :  4 bits

The problem is with the available space to store the data.  You can
see that the remaining depth is stored into 5 bits, i.e. the maximum
depth codable in this way is 32.  This is ok for depth, but not for
storing number of nodes.  If we also grab the 5 reserved bits, we can
store up to 1024 - still not enough.  

What I could do is to store kilonodes instead of nodes, but who says
that 1 million nodes would be enough for any reading?  The engine
currently does 200.000 tactical nodes per second (figure from Gunnars
computer).  Thus 1 million nodes is approximately 5 seconds of
reading.  Granted, this is a lot for a single call to, say,
owl_attack(), but not completely unthinkable.  This is especially true
for the computers of tomorrow, where we might easily do several
million nodes per second.

What should be done for owl nodes, then, that potentially represent
thousands of tactical nodes?  One way to deal with it is to make a
rough estimate of the branching factor of the tactical reading, and
then add some depth to all nodes representing pattern based reading.
For instance, if an owl node is at depth 3, say, and the remaining
depth is 5, say, then we could store the value with remaining depth 15
instead of 5.  That way, we can make it likely that the owl nodes
won't get replaced in the table.  The real offset must of course be
estimated by some measurements on real games.

In Breukers Ph.D. thesis he compares a number of replacement schemes.
The one I suggested, TWOBIG, was the best.  The one implemented here,
TWODEEP, was the second best by a very small margin.  The weakness of
Breukers comparison is that he doesn't tell us how he uses the
available storage.  He just compares the efficiency for size of the
transposition table measured in number of nodes.  So it is possible
that if we take into account the extra storage needed by the TWOBIG
scheme, then TWODEEP is actually better.

I cannot tell, but he shows that TWODEEP is worse by so little that it
hardly matters.  These two schemes are far better than all other
schemes he tried.

> I think the transposition table should rather be a private variable
> inside cache.c, just as the old one.

Maybe.  I have no preference about this.

Ok, that was it.  If nobody comes up with good objections, I think I
will use the following plan:

1) Go back to the old type of hash value (two 32-bit ints).
2) Add more calls to the transposition table in other reading
   functions than attack() and find_defense().
3) Establish some figures for the offset for owl nodes, semeai nodes
   and connection nodes.
4) Implement the new cache scheme for these types of reading.
5) Remove the old cache.

If anybody wants to help me, I would be especially grateful if you
would do some work on point 3 above.  Or maybe suggest another way to
solve the problem.

      -Inge




reply via email to

[Prev in Thread] Current Thread [Next in Thread]