bug-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q


From: Stijn van Dongen
Subject: [Bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q
Date: Sat, 9 Feb 2008 13:57:55 +0000
User-agent: Mutt/1.5.9i

         Hi,

I believe there is a bug in the functions gsl_cdf_hypergeometric_P and
gsl_cdf_hypergeometric_Q in gsl-1.10 when computing the midpoint. This can
result in severely wrong probabilities (as a result of computing the
non-optimal tail), including negative probabilities.  A demonstration program
is included below.

Explanation: The relevant statement is

    double midpoint = (int) (t * n1 / (n1 + n2));

The integer multiplication t * n1 may overflow. With certain
input parameters it does; this causes the wrong tail to be computed,
leading to accumulation of rounding errors, and erroneous results
including negative probabilities.  Running the example program yields

   midpoint should be: 3899.52, but it is: 9
   midpoint should be: 3899.52, but it is: 9
   hyper [ > 3511 1.000000006 ] [ <= 3511 -6.46390208e-09 ]

The midpoint is then compared with the variable k which has value 3511,
leading to a choice of the wrong tail to compute.
(the first two lines are from a patched version of cdf/hypergeometric.c that
prints out the midpoint).  I believe that the correct computation of midpoint
should be:

   double midpoint = (1.0 * t * n1) / (n1 + n2);

A (trivial) diff to this effect is attached.


On a side note, is there any knowledge/documentation available on the range
of parameters for which these functions are supposed to give good/reasonable
precision?


regards,
Stijn

Example program (run on a system where integers are 4 bytes):

----------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include <gsl/gsl_cdf.h>

int main
(  int argc
,  char* argv[]
)
   {  int k = 3511
   ;  int n1 = 76200
   ;  int bg = 13250474
   ;  int t = 678090
   ;  double f = gsl_cdf_hypergeometric_Q(k, n1, bg-n1, t)
   ;  double g = gsl_cdf_hypergeometric_P(k, n1, bg-n1, t)
   ;  fprintf(stdout, "hyper [ > %d %.10g ] [ <= %d %.10g ]\n", k, f, k, g)
   ;  return 0
;  }
----------------------------------------

-- 
Stijn van Dongen                >8< -o)   O<  forename pronunciation: [Stan]
Wellcome Trust Sanger Institute,    /\\   Tel: +44-(0)1223-494785
Hinxton, Cambridge, CB10 1SA, UK   _\_/   http://www.sanger.ac.uk/Users/svd/



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Attachment: hypergeometric.c.diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]