[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Re: Rollout jsd, statsig etc. [LONG]
From: |
Timothy Y. Chow |
Subject: |
[Bug-gnubg] Re: Rollout jsd, statsig etc. [LONG] |
Date: |
Mon, 16 Nov 2009 13:48:45 -0500 (EST) |
Massimiliano Maini <address@hidden> wrote:
> Why don't we show the % instead of the JSD ? It's much more reasonable.
The trouble with this is that the percentages don't mean what you think
they mean.
In the bgonline thread, some people got the misimpression that the
points I was making were philosophical ones, and that I was arguing
as a Bayesian. Before I go any further, let me state clearly at the
outset that the points I am about to make are *strictly from the point
of view of classical hypothesis testing*. I am *not* going to argue
here that a Bayesian approach is better. Instead, I am just going to
clear up some common misconceptions about what confidence intervals
mean.
> Notice that the percentage shown aside the top play is the
> "confidence"we have in it being better than the 2nd best play.
This is not correct. It is an extremely common misconception. The
percentage is the probability that we would see the results that we in
fact see (or even more skewed results), *under the assumption that the
plays are equal*. This is *not* the same as the the *confidence we have
that the first play is better than the second play*.
I will state this again because it is so counterintuitive. We would like
to think that "5%" is the probability of some event occurring in the real
world. But *it's not*. 5% is the probability that, in the strange and
implausible world where *the two plays are equal*, something as skewed as
what we see (or something even more skewed) would occur. It is tempting,
*but wrong*, to twist this statement around into something like, "There is
a 5% probability that the lower-ranked play is better." THIS IS WRONG.
Given that it's wrong to say this in the case of just two plays, it
follows that describing the multivariate tail probability as "the
probability that the third-ranked play is the best" (in the case of more
than two plays) *is also wrong*, for the same reason.
I strongly believe that GNU Backgammon should not say things that are just
plain wrong, and should not perpetuate common statistical misconceptions.
Now, I happen to believe that percentages are more intuitive than j.s.d.
numbers, and I am in favor of reporting things as percentages rather than
as j.s.d. numbers. However, the percentages should *not* be incorrectly
described as "probabilities that this play is the best."
*If* one insists on having GNU Backgammon issue claims of the form, "the
probability that this play is the best is X%," *then* one should adopt a
Bayesian standpoint. But I promised to speak strictly from the point of
view of classical hypothesis testing, so I will say simply that statements
of the form "the probability that this play is the best is X%" are simply
*impossible* from this viewpoint. The multivariate tail probability, for
example, tells you only the probability that some strange event will occur
*under the assumption that the equities are equal to the estimated
equities*. This is *not* the same as *the probability that the true
equities are different from their estimated values*.
If you don't believe that what I am saying here is as clearcut as I am
claiming it is, then check with a statistician. And when I say
"statistician," I don't just mean a scientist who uses statistics on a
regular basis. I recently learned of a study where 70 academic
psychologists were quizzed on what confidence intervals meant, and only
3 out of the 70 got it right. (Oakes, Statistical inference: A
commentary for the social and behavioral sciences, Wiley, 1986.)
Tim
- [Bug-gnubg] Re: Rollout jsd, statsig etc. [LONG],
Timothy Y. Chow <=