[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gsl] inaccurate tabulated values in gsl-1.11/randist/gausszig.c
From: |
Joseph Pang |
Subject: |
Re: [Bug-gsl] inaccurate tabulated values in gsl-1.11/randist/gausszig.c |
Date: |
Tue, 6 May 2008 19:29:05 -0700 (PDT) |
Hi Brian,
Thanks for looking into this matter.
Yes, my comments still stand. Using an approximate sampling for the tail of
the base layer (or strip) should not be muddled with the pre-calculations of
the Ziggurat layers. In fact, I would have far less objection if the
pre-calculations were done with the exact erfc function, while keeping the
current approximate algorithm for the tail generation intact. Actually, this
is what I was suggesting in my original email.
For the following discussion, let us define
PARAM_R_APPROX = 3.44428647676 // current approx value
PARAM_R_EXACT = 3.442619855899 // exact value
A. With PARAM_R_APPROX, we are actually approximating the whole Gaussian
distribution with an approx distribution that has the same shape as Guassian up
to PARAM_R_APPROX, and then an exponential tail afterward. This approximate
distribution does not match the original Gaussian between
0 and PARAM_R_APPROX. They differ by a constant factor. This factor is close
to but not equal to 1.0.
B. With PARAM_R_EXACT and the approx exponential tail algorithm, we make no
approximation to the Gaussian distribution between 0 and PARAM_R_EXACT. But we
are approximating the conditional (the condition being that the random variable
is greater that PARAM_R_EXACT) distribution of the tail to the shape of an
exponential. (BTW, this approximation implies a discontinuity of the density
function at PARAM_R_EXACT, but that is beside the point)
Under this light, it is clear that alternative A is worse than B. From a
conceptual standpoint, it is much worse because we are sacrificing the accuracy
of the whole distribution just for the tail approximation. From a numerical
standpoint, it is not clear how large is the impact to user programs. My guess
is not a whole lot with the current parameters (ie 128 layers), but I wouldn't
vouch for that. Yet, I believe it is to the best interest of the GSL community
to have this fixed, in the spirit of keeping Open Source software the highest
quality through collaboration. This kind of discrepancy is virtually
impossible to spot for Closed Source software.
Regards.
Joseph Pang
Brian Gough wrote:
At Fri, 2 May 2008 10:22:40 -0700 (PDT),
Joseph Pang wrote:
The tables in the above file, implementing the Ziggurat method for
Gaussian, carry inaccurate numerical values. The problem is most
likely due to the inaccurate value of PARAM_R, which is defined as
3.44428647676.
The accurate value should be 3.442619855899. Even though the
difference seems small, it affects all the values in all the tables
to various degree. The extent of this impact to the accuracy of
user programs is not known. By the way, the accurate value did
appear in the original paper by Marsaglia and Tsang.
The reason for the inaccurate values is due to the use of erfc
approximation in the original C program that generates the Ziggurat
tables ( I dug up that C program somewhere from the Web). It is
easy to fix by using the accurate erfc function in this program to
re-generate the tabulated values.
Hello,
Thanks for your email. In the gausszig.c file there was a note which
says
* 2) use an acceptance sampling from an exponential wedge
* exp(-R*(x-R/2)) for the tail of the base strip to simplify the
* implementation. The area of exponential wedge is used in
* calculating 'v' and the coefficients in ziggurat table, so the
* coefficients differ slightly from those in the Marsaglia and
* Tsang paper.
Do your comments still stand after taking this into account?
- Re: [Bug-gsl] inaccurate tabulated values in gsl-1.11/randist/gausszig.c,
Joseph Pang <=