[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Optimisation of statistic calculations
From: |
Jason Stover |
Subject: |
Re: Optimisation of statistic calculations |
Date: |
Wed, 3 Nov 2004 14:03:10 +0000 |
User-agent: |
Mutt/1.4.2.1i |
Two ways attenuate eventual bloat of PSPP are:
1. As you mentioned, cache the common and, most importantly,
sufficient, statistics. Have every statistical procedure cache its
sufficient statistics for later use. After being computed once, the
sufficient statistics can be used by that or other procedures
later. Sufficient statistics are used frequently, so this policy of
caching them could reduce a lot of recomputation. This is especially
beneficial when working with distributions in the exponential family
(these are the commonly run analyses).
2. Use a generic optimization module. GSL provides one that could be
hooked in to PSPP. Different statistical estimation procedures use
the same backend algorithms (e.g., sorting for nonparametric routines
and Newton-Rhapson for generalized linear models). A single optimizer,
or other backend routines, can eliminate a lot of redundancy.
-Jason
On Wed, Nov 03, 2004 at 07:35:45PM +0800, John Darrington wrote:
> As we add more commands to PSPP, there becomes considerable repetition
> of code involving the calculation of common statistics eg, sum ,
> sum-of-squares, variance etc. and an add hoc approach can and has lead
> to the same calculations being unnecessarily repeated, which will make
> PSPP slower than it needs to be.
>
> The way it's going, PSPP is going to bloat, and have large chunks of
> disjoint code, duplicating the same basic algorithms time and time again.
>
> So I'm looking at introducing a framework to allow the optimal reuse
> of statistics already calculated. This will involve each statistic
> featuring in a DAG, and an engine which traverses the DAG in
> postorder. In fact, 2 DAGs will be required, because some stats
> require others to be completely precalculated.
>
> In fact, looking at some of the existing PSPP code, it appears that
> Ben may have had something similar in mind at one stage. Anyway, if
> anyone is an expert on such things, then please pipe up.
>
> J'
>
>
>
>
> --
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
> See http://wwwkeys.pgp.net or any PGP keyserver for public key.
>
>
> _______________________________________________
> pspp-dev mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/pspp-dev
--
address@hidden
SDF Public Access UNIX System - http://sdf.lonestar.org