[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address@hidden: Re: Regression results need checking ?]

From: Ben Pfaff
Subject: Re: address@hidden: Re: Regression results need checking ?]
Date: Sun, 04 Feb 2007 12:28:23 -0800
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)

Jason Stover <address@hidden> writes:

> I can fix the computation of the standardized coefficients, but before
> I do, I have a question. Is there a place where the regression
> procedure can just read the standard deviation for a variable, or must
> it compute the standard deviation itself? And if the regression 
> procedure must compute the standard deviation itself, is there 
> a single routine somewhere in src that it can use, or does it
> need its own?
> The reason I ask is because this test data set has missing data, and
> regression already has its own way of dealing with missing cases.  It
> would be nice if there were another standard procedure to call to
> compute descriptive statistics without having to make regression aware
> of yet another way to handle missing data. Computing means, standard
> deviations, and other univariate statistics is a common enough task
> that there should be one place to do it.

This has been on my to-do list for a long time.  I agree that it
is important to solve it.  I consider it somewhat difficult to
solve because of these factors:

        1. Different procedures want to include different data in
           the calculations.  Some want moments by SPLIT FILE
           groups or by other break groups, some want them for
           the entire file.  Some want to drop user-missing
           values, some want to drop even non-missing values
           when other variables are missing.  So we need a way to
           identify and distinguish these different needs.

        2. There needs to be a way to detect when the active file
           has changed, so that cached calculations can be
           dropped.  The same mechanism is likely to be useful
           for other optimizations if it is general-purpose

        3. We need a good data structure to store all this.  I
           was thinking about that a while ago and didn't come up
           with anything that made me entirely happy, but I'm
           sure that a good solution exists.

> So as long as we're on the topic, it might be nice to have a couple of
> routines in src/math to compute such descriptive statistics, and maybe
> even store them in a cache. Would a pool serve this purpose? I guess
> by raising the issue, this means I'm volunteering to do it.

I'm not sure that a pool is really a solution, but it might be
part of one.

I don't think you're obligated to build this.  It's an ongoing
issue and I'd rather have a good general-purpose solution than a
hasty one.
"Premature optimization is the root of all evil."
--D. E. Knuth, "Structured Programming with go to Statements"

reply via email to

[Prev in Thread] Current Thread [Next in Thread]