[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

status report

From: Ben Pfaff
Subject: status report
Date: Thu, 10 May 2007 23:15:26 -0700
User-agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)

The last week or so I've been cleaning up the simpler-proc branch
for review and eventual merging.  I think that this process will
probably take another week or two.

I've also done some performance regression testing on
simpler-proc versus the main branch and discovered some places
where simpler-proc performance was bad, especially in sorting.  I
fixed the worst of it but there's still a little needed
improvement before it'll be ready for merge.  Probably only a few
hours worth of work there though.

The last few days I've started writing the PSPP developers guide
that I mentioned a while back.  Here's a tentative outline, which
is bound to change as I continue writing:

Developer's Guide
* Introduction
* Basic concepts
** Values
** Variables
** Dictionaries
** Data sets
** Pools
** Coding conventions
* Syntax parsing
* Data processing
** Reading data
*** Casereaders generalities
*** Casereaders from data files
*** Casereaders from the active file
*** Other casereaders
** Writing data
*** Casewriters generally
*** Casewriters to data files
*** Modifying the active file 
**** Modifying cases obtained from active file casereaders has no real effect
**** Transformations; procedures that transform
** Transforming data
*** Sorting and merging
*** Filtering
*** Grouping
**** Ordering and interaction of filtering and grouping
*** Multiple passes over data
*** Counting cases and case weights
** Best practices
*** Multiple passes with filters versus single pass with loops
*** Sequential versus random access
*** Managing memory
*** Passing cases around
*** Renaming casereaders
*** Avoiding excessive buffering
*** Propagating errors
*** Avoid static/global data
*** Don't worry about null filters, groups, etc.
*** Be aware of reference counting semantics for cases
* Presenting output

The data processing chapter is the only one fully outlined.  I
figure the syntax parsing and output presentation chapters
shouldn't be written until the corresponding bits of PSPP are
more solid.  I have plans to work on each of those in turn after
merging simpler-proc; I'll be sure to talk them through here
before going beyond a prototype implementation.

The developers guide is not yet checked in to simpler-proc, not
even a skeleton.  Probably I'll do an initial check-in over the

Outside of PSPP, I have two big projects that are taking up time:

        * Graduation: I turned in the first draft of my PhD
          thesis to my advisor yesterday.  I'm hoping to schedule
          my defense for late June and then graduate by the end
          of Stanford's summer quarter.  Along with thesis
          revisions and preparing my defense I'm also going to
          embark on a job search.  Probably I'll take a job
          locally at least January, when universities start their
          faculty searches up again (I'm too late for this year's

        * Pintos: My educational operating system used at
          Stanford and elsewhere.  I'm currently working to
          integrate the contribution of a USB mass storage layer,
          which allows students to demonstrate their projects on
          real machines by running the OS off a USB flash drive.
          This is considerably more impressive than running
          inside a virtual machine as they currently do, so it
          seems worthwhile, but I'm very picky about what I put
          into Pintos so I'm having to do a lot of refactoring

Ben Pfaff

reply via email to

[Prev in Thread] Current Thread [Next in Thread]