pspp-dev
[Top][All Lists]

## Re: Oneway Anova from a covariance/cross-product matrix?

 From: John Darrington Subject: Re: Oneway Anova from a covariance/cross-product matrix? Date: Tue, 6 Jul 2010 19:09:05 +0000 User-agent: Mutt/1.5.18 (2008-05-17)

```So what exactly do we pass to reg_sweep ?

Passing M doesn't seem to help.  If we need to use g or x then that
requires access to the raw data.  I understood that anova could be calculated
from M alone.

J'

On Tue, Jul 06, 2010 at 11:25:18AM -0400, Jason Stover wrote:
Treat the problem as a regression problem and use the SWEEP operator,
as used in linreg.c and regression.q. More details are below.

On Mon, Jul 05, 2010 at 03:13:03PM +0000, John Darrington wrote:
> The cross-product matrix for this data is:
>
>       x    g1
> x   16.0  3.0
> g1   3.0  1.5

Call this matrix M, call the column vector transpose ((1,2,3,4,5,6)) x.
We will re-express our data of groups as the following matrix:

1 0
1 0
1 0
1 1
1 1
1 1

So the first column corresponds to a "grand mean" and the second tells
us the group. Call this matrix 'g'.

To get the sums of squares, you must consider this as a regression problem:

x = g * beta + error

...where beta is a 2x1 dimensional matrix of unknown parameters and
'*' denotes matrix multiplication. Another way to write this is x_i =
beta_0 + beta_1 + error for group b, and x_i = beta_0 + error for
group a.

The least-squares estimates of beta_0 and beta_1 must satisfy the
following relation:

transpose (g) * g * beta = transpose (g) * x

...which gives the solution for beta:

beta = (transpose (g) * g)^{-1} *transpose (g) * x

The SWEEP operator will give you the sums of squares and beta. It
works to solve the system via Gaussian elimination with partial
pivoting. Solving for beta in the above system isn't interesting by
itself for ANOVA, because of the unusual coding we used, but a
by-product of the SWEEP operator is the sums of squares being left in
place of the covariance matrix, and these sums of squares are
independent of the coding of g (for example, using the vector for
category b in the second column of the matrix beta would have given
the same sums of squares).

The code in regression.q and linreg.c does this, if you want to use
it. It might be more efficient and easier to reach straight for the
code in sweep.c.

-Jason

--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.

```

signature.asc
Description: Digital signature