pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optimisation of statistic calculations


From: John Darrington
Subject: Re: Optimisation of statistic calculations
Date: Thu, 4 Nov 2004 07:25:05 +0800
User-agent: Mutt/1.3.28i

On Wed, Nov 03, 2004 at 09:55:08AM -0800, Ben Pfaff wrote:
     
     Redundant time, i.e. calculating the same thing more than once
     for a single variable or data set, seems to me like a separate
     issue that needs to be addressed separately.  I haven't thought
     about this much at all.  Your DAG sounds like a pretty general
     scheme for addressing this.  I assume that it's basically a
     dependency graph: to calculate C we need to know A and B.  But
     it's not obvious to me why we need two DAGs; a Makefile is the
     same kind of dependency graph as far as I can tell and there's
     only one DAG involved in that case.  Can you explain further?


Consider the case of calculating variance.  Variance is defined as
         
         var = Sum (x_i - x_bar)^2 / n

where:
        x_bar = Sum(x_i)/n

(We know there's a more efficient formula, but I'll ignore that for
the benefit of this explaination)


So var depends upon x_bar and n.  However it doesn't need to known n
until it has iterated through all the cases, whereas it needs to know
x_bar before it even starts the computation.    Therefore, we have a
dependency graph which looks like:

   var        x_bar
    |          /\
    n         n  x_sum

(it happens to be disconnected in this case)
AND a pre_dependency graph:

    var
     |
    x_bar

Like I said, there are ways to calculate variance that don't need any
pre_dependencies, but there are other stats for which there isn't.
The Levene statistic is one of them.  Thus, you will notice that
levene.c makes two passes through the data.

Does this make sense?



-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://wwwkeys.pgp.net or any PGP keyserver for public key.


Attachment: pgpac30u22ixQ.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]