Re: [Bug-gnubg] Measuring performance levels

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Measuring performance levels

From:	Douglas Zare
Subject:	Re: [Bug-gnubg] Measuring performance levels
Date:	Wed, 23 Oct 2002 12:10:12 -0400
User-agent:	Internet Messaging Program (IMP) 3.1

Quoting Joseph Heled <address@hidden>:

> Douglas Zare wrote:
> >
> > [Snowie 1-ply]
> >           Rollout      Money equity: 0.505
> >                0.1%   3.6%  77.0%    23.0%   7.2%   0.0%
> >                95% confidence interval:
> >                   - money cubeless eq.: 0.505 ±0.013.
>
> at 0ply, 12960 games I got
> 
>   0.07% 3.6% 77.06% 23% 7.08% 0.0%
> 
> So this is the same as the above (SN 3?). I leave higher plies to someone
> with a
> stronger machine.

It looks the same, but since Snowie does not use variance reduction for 1-ply 
rollouts, it might be that Snowie plays slightly better or slightly worse.

Since the goal of a cubeless money player is to maximize cubeless equity, I 
think the primary indicator of playing strength is the cubeless equity, though 
the bg/g/w distribution is interesting. Here, I see .8073-.3008 = 0.5065 +- ? 
for gnu 0-ply, and 0.505 +- 0.013 (confidence interval) for Snowie 1-ply. 
Whatever the standard error was for gnu, one would need a longer Snowie rollout 
to tell which plays better. I'm running a longer Snowie 1-ply rollout, and 
after about 100,000 games it looks like Snowie 1-ply was lucky in the first 
rollout.

The other rollouts in my column were 2-ply with 1440 trials (confidence 
interval radius about 0.008) and 3-ply with 360 trials (confidence interval 
radius about 0.016 (and Snowie 4 2-ply with 324 trials and Jellyfish Level 6 
with 9000 trials). These lengths were chosen for time constraints as I was past 
the deadline. I can extend these rollouts, but it would also be a good idea to 
roll out other positions of one-sided or lopsided errors. I think it would be 
more interesting to look at positions with multiple checkers back and/or some 
borne off.

By the way, the equities for the four 3-ply rollouts ranged from 0.525 to 
0.542. I think with perfect play the position is a close take/pass decision.

About how much time would a gnu 4-ply  (versus 2-ply for the side without 
decisions) rollout of n games take? Can you perform the variance reduction 
using 2-ply evaluations? 

Douglas Zare

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-gnubg] Measuring performance levels, (continued)
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/25
  - Re: [Bug-gnubg] Measuring performance levels, Douglas Zare, 2002/10/25

Prev by Date: Re: [Bug-gnubg] Measuring performance levels
Next by Date: Re: [Bug-gnubg] Measuring performance levels
Previous by thread: Re: [Bug-gnubg] Measuring performance levels
Next by thread: Re: [Bug-gnubg] Measuring performance levels
Index(es):
- Date
- Thread