[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GLM procedure
From: |
Jason Stover |
Subject: |
Re: GLM procedure |
Date: |
Fri, 17 Oct 2008 11:21:23 -0400 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Fri, Oct 17, 2008 at 10:42:52AM +0800, John Darrington wrote:
> On Thu, Oct 16, 2008 at 04:35:40PM -0400, Jason Stover wrote:
>
> About GLM: I think glm.q is going to grow large enough to make glm.q
> illegible. It has to be able to do a lot of different tasks, and they
> should be split among different files.
>
> So maybe we should all hack on the GLM procedure. The guts of it are
> just least-squares, but translating from the user's syntax to the
> particular covariance matrix and back will be a lot of work. I can
> handle the least-squares coding and the post-fit computations, but I
> think someone else should write the code to read the syntax, dole out
> the work to the linear model code, and organize the output. I don't
> mind doing it all myself if no one else wants to pitch in, but it will
> take a lot longer that way.
>
>
> Dividing the work load is certainly a good idea. But this is where I
> start to show my ignorance about advanced statistical methods.
>
> * Can you summarize what the different tasks are that GLM needs to
> perform? Are they really seperate tasks or specialisations of one
> general task? The name GLM leads me to think the latter.
Before I answer this, remember that I haven't actually coded it all,
so I don't know where exactly the headaches will be. Some of what I
mention could be easy. Some things I didn't think of may be very hard.
The most common tasks I can think of now are:
1. Typical linear model fitting, the kind that REGRESSION already does.
2. What I could call "disguised" typical linear model fitting: That
is, REGRESSION already does it, but the user doesn't think of it this
way. Examples are any analyses in which the user specifies variables
that are only fixed (as opposed to random), not nested, which may or
may not include interactions. The results are reported differently than
in case 1.
3. Regression with random effects. This means getting the typical
least-squares estimates for the parameters, but the estimates of the
mean squares would differ.
4. Nested effects. I'll have to look up computations for parameter
estimates and nested effects. Combinations of nested and random effects
are also possible.
All of the above are univariate, meaning the code in linreg.c could
handle them, with some tweaks (perhaps outside linreg.c) to estimate
the different mean squares, etc. So we can think of them as specializations
of the task already done in linreg.c. But the specializations will be
difficult. I think the trouble will be in taking the users' input and
translating into something linreg.c can take, then translating back
into output the users see. This was the biggest difficulty for me
writing REGRESSION, which was an easier procedure than GLM.
Then there are the multivariate analyses:
5. Repeated measures: In this case, there are multiple dependent
variables, which are assumed to have some covariance structure that
needs accounting for. The code in linreg.c is inadequate for such a
task. Unfortunately, this is also one of the most common uses of GLM.
The first four cases "just" need the code in linreg.c, maybe with some
extra code at the end to tailor the final computations of sums of squares
and p-values. The last one will need more serious work in src/math.
> * Can you post some example syntax of the more common ways that you
> think people will use GLM?
I'll look for some. The reference manuals I just read show only how to
use the gui.
> * Are there any other commands in the spss language that we should think
> about at the same time? eg MANOVA ANOVA ANACOVA UNIANOVA
I think if cases 1 through 4 above can be handled by GLM, then we
would also to be able to write an ANOVA procedure without much more
trouble. Case 5 above corresponds to MANOVA.
But I don't think any of those procedures is used nearly as often as
GLM.
-Jason