pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regression and glm with big data


From: Jason Stover
Subject: regression and glm with big data
Date: Tue, 14 Aug 2007 22:45:58 -0400
User-agent: Mutt/1.5.10i

Right now, linreg.c, regression.q and glm.q won't handle large data
sets very well. The problem is that the regression and (currently
fetal) glm procedure store the entire data set in memory, then pass
the data to pspp_linreg () which finds the least squares estimates.

Storing the entire data set in memory isn't necessary, just easier to
code. PSPP could handle much bigger data sets if, in the
casereader_read loop, it computed two matrix products from the data in
a single pass, then sent that, much smaller, information to
pspp_linreg().

But there may be tasks for which pspp_linreg () should accept all the
data as a single matrix, so it should probably be able to do that,
too.

My question is: Should I do this now, or wait until after the release?
It will probably change a lot of code in linreg.c, and could introduce
several bugs. The benefit would be to make any procedure that needs
regression able to run with very large data sets.

-Jason




reply via email to

[Prev in Thread] Current Thread [Next in Thread]