pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Covariance Matrices


From: John Darrington
Subject: Covariance Matrices
Date: Wed, 20 Aug 2008 08:45:00 +0800
User-agent: Mutt/1.5.13 (2006-08-11)

I've been thinking how to implement factor analysis.

One thing that's clearly required is a covariance matrix. We currently
have src/math/covariance-matrix.[ch] which should do the job, but I
think it can be improved.  The current interface has the function

  void covariance_pass_two (struct design_matrix *cov,
                            double mean1, double mean2,
                            double ssize,
                            const struct variable *v1,
                            const struct variable *v2,
                            const union value *val1,
                            const union value *val2);

1.  As I see it, mean1 and mean2 are not necessary.  The traditional
    definition of cov(x,y) is
     \sum{(x_i - \overbar{x})(y_i - \overbar{y})},
      but this can be expanded to
     \sum x_i y_i - \overbar{x}\sum y_i - \overbar{y}\sum x_i + 
     n\overbar{x}\overbar{y},
     which doesn't have any \overbar components inside the \sum, so 
     the mean can be calculated as we go, and applied post hoc.

2.  I'm not sure what the ssize parameter is for.  It's only used in
    the case where at least one variable is alpha (and I've not yet
    worked out in my mind what meaning a covariance matrix for string
    variables can have).  But in glm.q I see that it's being passed
    the same value in all invocations.  Is this going to be generally
    true?  If so, then we can make ssize a member of cov, and set it
    at construction time.

3.  Rather than passing in val1 and val2, I suggest that we pass in
    a (struct ccase *), and index into the values inside the function,
    using v1 and v2.


If these suggestions are implemented, then this function becomes

  void covariance_accumulate (struct design_matrix *cov,
                            const struct variable *v1,
                            const struct variable *v2,
                            const struct ccase *c);

Typically, this will be called in two nested loops, thus:


 for (; casereader_read (reader, &c); case_destroy (&c))
   {
    for (i = 0; i < n_all_vars; ++i)
    {
      const struct variable *v = all_vars[i];
      for (j = i; j < n_all_vars; j++)
        {
          const struct variable *w = all_vars[j];
          covariance_accumulate (X, v, w, &c);
        }
    }
 }

But all all_vars is also passed in at construction time, so we might as well 
put the two loops inside the function, and not bother with v1 and v2 
as parameters.   Then we have simply:


  void covariance_accumulate (struct design_matrix *cov,
                            const struct ccase *c);


I can see that there are a lot of procedures which will need 
covariance matrices, so the simpler we make their creation, the better.


How many of these suggestions make sense?

J'

--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]