gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [gnugo-devel] More on minimum values


From: Evan Berggren Daniel
Subject: RE: [gnugo-devel] More on minimum values
Date: Sat, 15 Feb 2003 14:16:31 -0500 (EST)

What sorts of statistical measures are you using?  these statistics are
amazingly hard to get right if you do anything more complex than a single
series and calculate a p-value for p != .5.  I'm not saying you're doing
things incorrectly, just that my intuition at least is very bad, and I
would be very skeptical of any results I produced without carefully
checking over the requirements for all the statistical procedures I used.
Then again, I don't do statistics that much; I just had a teacher who
emphasized the need to carefully check every single assumption when doing
any tests.

For statistics on a simple 100 game series:

sd = sqrt(n * p * (1 - p))

p is the null hypothesis percentage; in this case the null hypothesis is
that the patched and unpatched versions are the same.

sd = sqrt(100 * .5 * .5) = 5.

67 - p * 100 = 17

17 = 3.4 * sd

gives a p-value ~= 99.8% that the null hypothesis is incorrect.

99.8% is statistically significantfor the vast majority of purposes, and
therefore I would contend that my patch actually changed something.

The equivalent calculation on your data:

sd = sqrt(400 * .5 * .5) = 10

229 - 200 = 29

29 = 2.9 * sd

2.9 standard deviations gives a p-value ~= 99.6%.  From this I would
conclude that your patch changed something.  Of course, for that to be
valid, the relevant assumptions must hold.  The only case that might not
work for is if you decided after some games had been played when to stop
the series.

Evan Daniel

On Sat, 15 Feb 2003, Portela Fernand wrote:

> Evan wrote:
>
> > (...) I ran a 100 game series between the unpatched and patched versions.
> > The patched version won 67 to 33.
>
> I just finished running a 4x100 match, also between an unpatched and a
> patched
> version (the nature of this patch doesn't matter, although it's also
> something
> related to move valuations, so only really measurable with twogtp matches),
> and
> exchanging colors at every 100 games.
>
> Serie 1 (patch as white): W 53/47
> Serie 2 (patch as black): B 60/40
> Serie 3 (patch as white): B 63/37
> Serie 4 (patch as black): B 53/47
>
> I analyzed the results with statistical tools, and observed that while the
> serie played with black (2 & 4) is pretty solid (rather low standard
> deviation), the ones with white (1 & 3) are still too "wild". So, there's
> only
> one valid conclusion for me here: getting statistically valid results
> without
> running at least for 500 games and exchanging colors, is probably very
> difficult.
>
> I'm not saying your conclusions are wrong, they are possibly correct. I'm
> only
> saying that after I made this experiment, I can't be convinced by the
> results
> of any twogtp match run with less than the minimums I indicated above.
>
> /nando
>
>
> _______________________________________________
> gnugo-devel mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/gnugo-devel
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]