[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GLM/ANOVA
From: |
Ed |
Subject: |
Re: GLM/ANOVA |
Date: |
Sat, 7 Jun 2008 10:09:13 +0100 |
2008/6/5 Jason Stover <address@hidden>:
> This is a nice way to understand the problem, but I don't think it's
> the right way to code it. I would like to keep lib/linreg as a
> univariate library. That way, it is useful to other procedures in an
> obvious way ("lib/linreg: solve the normal equations for me now!")
> Also, it's complex enough dealing with Level 2 linear algebraic
> operations. Making it deal with Level 3 to solve systems of matrices
> could be done more easily by writing another, higher-level library
> that uses lib/linreg to do the lower-level solving of systems.
I didn't totally understand this. I would have thought that either
reusing linreg, or writing a separate routine that made all its calls
to matrix algebra would be best. Calling a vector routine multiple
times to perform matrix algebra must be slow, and surely defeats all
the blocking code for fast matrix multiply. So I guess my suggestion
is to have a mlinreg.{c,h} that just does linreg in matrices if
reusing linreg is problematic.
> Also, most users aren't going to need the full matrix systems since
> most users are just doing the usual linear regression with qualitative
> variables. So unless the user specifically asks for MANOVA,
> lib/linreg can already handle the job.
I don't want to put words into your mouth, I just want to check if I
got this right. You seem to be saying that linreg is basically the
core of GLM univariate from SPSS already, which is true except for the
handling of random factors. I guess since the solution for those is
just postmultiplication by another matrix, this is fine.
In which case, perhaps the sensible thing for me to aim for first is
to provide the GLM Univariate functionality using linreg. This makes
the first steps support for more complex models in the design matrix,
the various types of sums of squares, and the unianova commands, which
seems reasonable.
>
> There is another reason not to change lib/linreg: lib/linreg currently
> assumes it has been passed a matrix consisting of all the data. I
> would like to change this to make PSPP better able to handle large
> data sets. This will mean that lib/linreg is going to have to know
> whether it has been passed all the data, or products of the design
> matrix and the vector of the dependent variable. That change to
> lib/linreg will make it more complex than it already is.
I guess my argument for reuse is just that one might imagine most of
the code to handle large datasets would be shared between the two
routines. If its simply two execution paths in one routine, then
there's no replica to maintain; essentially, one routine gets it
"free". The more sophisticated and complicated the code is, the more
benefit to not coding it twice.
> We will have to look up the more abstruse sums of squares and
> statistics as we need them. I don't know if all that stuff should be
> in a linear regression library, or outside in the code of a
> procedure. It should be outside in a procedure if it's not something
> that other procedures might need; inside a library otherwise.
I was just going to have a sumofsquares.{c,h} somewhere that can take
a convert a set of regression results into sums of squares of the
various types. Presumably it would be called by whatever drives the
regression stuff, probably the UNIANOVA command or something similar.
> There is code for making design matrices in
> src/math/design-matrix.[ch]. It needs work, but it is a start. The big
> problem there is the "accounting" problem of mapping values of a qualitative
> variable back and forth to vectors with binary entries.
I think this is soluble. The stuff that's already in place is solid.
>From here, its just a matter of deciding where to do the accounting.
Ed