bug-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gsl] inaccurate tabulated values in gsl-1.11/randist/gausszig.c


From: Joseph Pang
Subject: Re: [Bug-gsl] inaccurate tabulated values in gsl-1.11/randist/gausszig.c
Date: Tue, 6 May 2008 19:29:05 -0700 (PDT)

Hi Brian,

Thanks for looking into this matter.

Yes, my comments still stand.  Using an approximate sampling for the tail of 
the base layer (or strip) should not be muddled with the pre-calculations of 
the Ziggurat layers.  In fact, I would have far less objection if the 
pre-calculations were done with the exact erfc function, while keeping the 
current approximate algorithm for the tail generation intact.  Actually, this 
is what I was suggesting in my original email.

For the following discussion, let us define

PARAM_R_APPROX = 3.44428647676  // current approx value
PARAM_R_EXACT = 3.442619855899  // exact value

A. With PARAM_R_APPROX, we are actually approximating the whole Gaussian 
distribution with an approx distribution that has the same shape as Guassian up 
to PARAM_R_APPROX, and then an exponential tail afterward.  This approximate 
distribution does not match the original Gaussian between
0 and PARAM_R_APPROX.  They differ by a constant factor.  This factor is close 
to but not equal to 1.0.

B. With PARAM_R_EXACT and the approx exponential tail algorithm, we make no 
approximation to the Gaussian distribution between 0 and PARAM_R_EXACT.  But we 
are approximating the conditional (the condition being that the random variable 
is greater that PARAM_R_EXACT) distribution of the tail to the shape of an 
exponential.  (BTW, this approximation implies a discontinuity of the density 
function at PARAM_R_EXACT, but that is beside the point)

Under this light, it is clear that alternative A is worse than B.  From a 
conceptual standpoint, it is much worse because we are sacrificing the accuracy 
of the whole distribution just for the tail approximation.  From a numerical 
standpoint, it is not clear how large is the impact to user programs.  My guess 
is not a whole lot with the current parameters (ie 128 layers), but I wouldn't 
vouch for that.  Yet, I believe it is to the best interest of the GSL community 
to have this fixed, in the spirit of keeping Open Source software the highest 
quality through collaboration.  This kind of discrepancy is virtually 
impossible to spot for Closed Source software.

Regards.

Joseph Pang


Brian Gough wrote:    
At Fri, 2 May 2008 10:22:40 -0700 (PDT),
Joseph Pang wrote:

   
The tables in the above file, implementing the Ziggurat method for
Gaussian, carry inaccurate numerical values.  The problem is most
likely due to the inaccurate value of PARAM_R, which is defined as
3.44428647676.

The accurate value should be 3.442619855899.  Even though the
difference seems small, it affects all the values in all the tables
to various degree.  The extent of this impact to the accuracy of
user programs is not known.  By the way, the accurate value did
appear in the original paper by Marsaglia and Tsang.

The reason for the inaccurate values is due to the use of erfc
approximation in the original C program that generates the Ziggurat
tables ( I dug up that C program somewhere from the Web).  It is
easy to fix by using the accurate erfc function in this program to
re-generate the tabulated values.

   

Hello,
Thanks for your email.  In the gausszig.c file there was a note which
says

   * 2) use an acceptance sampling from an exponential wedge
   * exp(-R*(x-R/2)) for the tail of the base strip to simplify the
   * implementation.  The area of exponential wedge is used in
   * calculating 'v' and the coefficients in ziggurat table, so the
   * coefficients differ slightly from those in the Marsaglia and
   * Tsang paper.

Do your comments still stand after taking this into account?


  


reply via email to

[Prev in Thread] Current Thread [Next in Thread]