[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] Measuring performance levels
From: |
Douglas Zare |
Subject: |
Re: [Bug-gnubg] Measuring performance levels |
Date: |
Wed, 23 Oct 2002 12:10:12 -0400 |
User-agent: |
Internet Messaging Program (IMP) 3.1 |
Quoting Joseph Heled <address@hidden>:
> Douglas Zare wrote:
> >
> > [Snowie 1-ply]
> > Rollout Money equity: 0.505
> > 0.1% 3.6% 77.0% 23.0% 7.2% 0.0%
> > 95% confidence interval:
> > - money cubeless eq.: 0.505 ±0.013.
>
> at 0ply, 12960 games I got
>
> 0.07% 3.6% 77.06% 23% 7.08% 0.0%
>
> So this is the same as the above (SN 3?). I leave higher plies to someone
> with a
> stronger machine.
It looks the same, but since Snowie does not use variance reduction for 1-ply
rollouts, it might be that Snowie plays slightly better or slightly worse.
Since the goal of a cubeless money player is to maximize cubeless equity, I
think the primary indicator of playing strength is the cubeless equity, though
the bg/g/w distribution is interesting. Here, I see .8073-.3008 = 0.5065 +- ?
for gnu 0-ply, and 0.505 +- 0.013 (confidence interval) for Snowie 1-ply.
Whatever the standard error was for gnu, one would need a longer Snowie rollout
to tell which plays better. I'm running a longer Snowie 1-ply rollout, and
after about 100,000 games it looks like Snowie 1-ply was lucky in the first
rollout.
The other rollouts in my column were 2-ply with 1440 trials (confidence
interval radius about 0.008) and 3-ply with 360 trials (confidence interval
radius about 0.016 (and Snowie 4 2-ply with 324 trials and Jellyfish Level 6
with 9000 trials). These lengths were chosen for time constraints as I was past
the deadline. I can extend these rollouts, but it would also be a good idea to
roll out other positions of one-sided or lopsided errors. I think it would be
more interesting to look at positions with multiple checkers back and/or some
borne off.
By the way, the equities for the four 3-ply rollouts ranged from 0.525 to
0.542. I think with perfect play the position is a close take/pass decision.
About how much time would a gnu 4-ply (versus 2-ply for the side without
decisions) rollout of n games take? Can you perform the variance reduction
using 2-ply evaluations?
Douglas Zare
- Re: [Bug-gnubg] Measuring performance levels, (continued)
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- RE: [Bug-gnubg] Measuring performance levels, Albert Silver, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joseph Heled, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/24
- Re: [Bug-gnubg] Measuring performance levels, Douglas Zare, 2002/10/24
Re: [Bug-gnubg] Measuring performance levels,
Douglas Zare <=
- Re: [Bug-gnubg] Measuring performance levels, Joseph Heled, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/24
- variance reduction [Was Re: [Bug-gnubg] Measuring performance levels], Joern Thyssen, 2002/10/24
- Re: variance reduction [Was Re: [Bug-gnubg] Measuring performance levels], nis, 2002/10/28
- Re: variance reduction [Was Re: [Bug-gnubg] Measuring performance levels], Joern Thyssen, 2002/10/28
- Re: variance reduction [Was Re: [Bug-gnubg] Measuring performance levels], nis, 2002/10/29
Re: variance reduction [Was Re: [Bug-gnubg] Measuring performance levels], Douglas Zare, 2002/10/28
Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/24
Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/25