[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

address@hidden: Re: Regression results need checking ?]

From: Jason Stover
Subject: address@hidden: Re: Regression results need checking ?]
Date: Sun, 4 Feb 2007 12:05:43 -0500
User-agent: Mutt/1.5.10i

I forgot to CC the list.

----- Forwarded message from Jason Stover <address@hidden> -----

Date: Sun, 4 Feb 2007 11:50:38 -0500
From: Jason Stover <address@hidden>
To: John Darrington <address@hidden>
Subject: Re: Regression results need checking ?
In-Reply-To: <address@hidden>
User-Agent: Mutt/1.5.10i

I just checked in a fix for the computation of the p-values.

I can fix the computation of the standardized coefficients, but before
I do, I have a question. Is there a place where the regression
procedure can just read the standard deviation for a variable, or must
it compute the standard deviation itself? And if the regression 
procedure must compute the standard deviation itself, is there 
a single routine somewhere in src that it can use, or does it
need its own?

The reason I ask is because this test data set has missing data, and
regression already has its own way of dealing with missing cases.  It
would be nice if there were another standard procedure to call to
compute descriptive statistics without having to make regression aware
of yet another way to handle missing data. Computing means, standard
deviations, and other univariate statistics is a common enough task
that there should be one place to do it.

So as long as we're on the topic, it might be nice to have a couple of
routines in src/math to compute such descriptive statistics, and maybe
even store them in a cache. Would a pool serve this purpose? I guess
by raising the issue, this means I'm volunteering to do it.


On Sat, Feb 03, 2007 at 10:50:17PM -0500, Jason Stover wrote:
> For the first example, it looks like pspp and spss are computing the same 
> basic 
> statistics. The most important values in the output are those in the 
> ANOVA table, the coefficients and their standard errors. All these agree
> with the values shown in the example. That said, there are some discrepancies
> which need further attention:
> 1. The "standard error of the estimate" in the model summary table.
>    On the page whose link you sent, this value is about 64, but pspp
>    reports a value of about .08. I'll have to check this one later,
>    but for now, I suspect it's the web page that has the incorrect
>    value. If I remember correctly, the value should be the standard
>    error of R-square. R-square is always between 0 and 1, and
>    therefore should not have a standard error larger than 1, as the
>    web page reports.
> 2. The "Coefficients" table, in the column referring to the
>    standardized coefficients. This is something I'll have to check
>    more closely later, but pspp seems to report the incorrect values
>    here.
> 3. The discrepancy in the "Sig." column needs to be checked. I'm guessing
>    it's something simple, like a miscalculation of degrees of freedom. This
>    column is filled in by a straightforward computation of the t distribution.
> I'll look into this more over the next few days and patch as necessary.
> -Jason
> On Sat, Feb 03, 2007 at 08:25:06AM +0900, John Darrington wrote:
> > When I try out the exercises at
> >
> > using pspp, the numbers I get are quite different to those in their
> > examples.
> > 
> > Do we have something wrong or do they?
> > 
> > J'
> > 
> > -- 
> > PGP Public key ID: 1024D/2DE827B3 
> > fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> > See or any PGP keyserver for public key.
> > 
> > 
> > _______________________________________________
> > pspp-dev mailing list
> > address@hidden
> >
> _______________________________________________
> pspp-dev mailing list
> address@hidden

----- End forwarded message -----

reply via email to

[Prev in Thread] Current Thread [Next in Thread]