[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
How competent are the PSPP developers?
From: |
John Darrington |
Subject: |
How competent are the PSPP developers? |
Date: |
Fri, 12 Nov 2004 11:47:59 +0800 |
User-agent: |
Mutt/1.3.28i |
I thought it was about time somebody started using PSPP for something real.
The obvious thing was using it to determine the level of effort and competence
of the PSPP developers. I got a history from cvs using:
cvs history -c -a > cvs-hist
Then I wrote a few lines of PSPP syntax:
DATA LIST FIXED
FILE='cvs-hist'
/RTYPE 1-1 (A)
WHEND 3-12 (SDATE)
WHENT 14-18 (TIME)
WHO 26-33 (A)
WHAT 40-70 (A)
.
VALUE LABELS /RTYPE 'A' 'Added' 'M' 'Modified' 'R' 'Removed'.
VARIABLE LABEL WHAT 'Filename'.
VARIABLE LABEL WHO 'Developer'.
VARIABLE LABEL WHEND 'Date'.
First, I wanted to know who had been most active, so I did
SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
SPLIT FILE OFF.
Clearly, blp is the most active developer, with jmd not far behind.
However a major part of jmd's effort simply involves deleting files.
He's deleted more files than he's added, so probably his total contribution
is negative!!
But this doesn't indicate how competent the respective developers are.
So I devised a statistic to measure it as follows:
I took the difference between sucessive modifications to a file, noting
the developer who did the prior modification. My reasoning is, that if the
developer did it right in the first place, then that file won't have to be
modified for a long time. Thus, the most competent developers will have a
very long time between sucessive commits on their files.
COMPUTE T= XDATE.JDAY(WHEND).
COMPUTE WHAT=RTRIM(WHAT).
SORT CASES BY WHAT, T (D).
SPLIT FILE BY WHAT.
* Number of days between file modification
COMPUTE DIFF = LAG(T) - T.
COMPUTE DIFF = DIFF / 3600 / 24.
VARIABLE LABEL DIFF 'Time between modification'.
LIST.
So DIFF gives me a variable which is the number of days between sucessive
modifications on a file.
Like a good statistician, I want to make sure it's normally distributed, so
I do:
EXAMINE DIFF
/STATISTICS = DESCRIPTIVES
/PLOT = NPPLOT
.
It's clearly not normal, but like a good politician I'll ignore the statistics
when it suits me to do so, and only publish the ones that are favourable to my
agenda.
EXAMINE DIFF BY WHO
/STATISTICS = DESCRIPTIVES
/NOTOTAL
.
Well of the 4 developers, the most competent is pjk, whose work has to be
re-done on average 58 days later. Mkiefte comes in second with a score of
32 days. Blp's work needs attention 28 days later, and jmd is the most
incompetent developer. His work needs fixing 26 days later.
Now I want to know if these differences are significant. I use the ONEWAY
command to do this.
ONEWAY DIFF BY WHO
/STATISTICS = HOMOGENEITY
/CONTRASTS = -3, 1, 1, 1
/CONTRASTS = 0, 1, -1, 0
.
I ran two planned contrast tests. The first to test if mkiefte's result is
significantly different from the others (since I noticed it's somewhat larger).
The second to show if the there is significant differences between the two
most active developers' competence (jmd and blp).
The overall result is not significant at the 0.05 level, so we can say that in
general, all developers are equally (in)competent.
Now for the contrasts. The homogeneity of variance test is not significant, so
we use the `Assume equal variances' results:
For test 1, there is significant contrast at 0.05, so mkeifte is significantly
more competent than the other developers.
Test 2 is not significant, so blp and jmd do not seem to be any more competent
than each other.
*****************************************************************************
Anyway, this exercise uncovered a few bugs in PSPP, some of which I've plugged.
The others are:
1. In the first FREQUENCIES table, only the "Removed" label is displayed.
For some reason the "Modified" and "Added" labels are not displayed.
2. I should be able to replace the lines:
SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
SPLIT FILE OFF.
with
TEMPORARY.
SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
When I tried it, it segfaulted.
3. The LIST command produces 50 pages of rather uninteresting numbers,
so I commented it out. Strangely, when I do this, all the following
commands are ignored.
4. The XDATE.JDAY function doesn't seem to behave as the manual explains it.
The manual says it gives a number between 1 and 366 indicating the number
of days from the start of the year. In fact it seems to give the number of
seconds since some arbitrary epoch (which happened to be what I wanted in
this instance).
Full text of the program below:
TITLE 'Level of Developer Contribution to PSPP'
* cvs-hist generated by cvs history -c -a
DATA LIST FIXED
FILE='cvs-hist'
/RTYPE 1-1 (A)
WHEND 3-12 (SDATE)
WHENT 14-18 (TIME)
WHO 26-33 (A)
WHAT 40-70 (A)
.
VALUE LABELS /RTYPE 'A' 'Added' 'M' 'Modified' 'R' 'Removed'.
VARIABLE LABEL WHAT 'filename'.
VARIABLE LABEL WHO 'Developer'.
VARIABLE LABEL WHEND 'Date'.
SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
SPLIT FILE OFF.
COMPUTE T= XDATE.JDAY(WHEND).
COMPUTE WHAT=RTRIM(WHAT).
SORT CASES BY WHAT, T (D).
SPLIT FILE BY WHAT.
* Number of days between file modification
COMPUTE DIFF = LAG(T) -T.
COMPUTE DIFF = DIFF / 3600 / 24.
VARIABLE LABEL DIFF 'Time between modification'.
LIST.
SPLIT FILE OFF.
SELECT IF (DIFF > 0).
EXAMINE DIFF
/STATISTICS = DESCRIPTIVES
/PLOT = NPPLOT
.
EXAMINE DIFF BY WHO
/STATISTICS = DESCRIPTIVES
/NOTOTAL
.
ONEWAY DIFF BY WHO
/STATISTICS = HOMOGENEITY
/CONTRASTS = -3, 1, 1, 1
/CONTRASTS = 0, 1, -1, 0
.
EXECUTE.
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://wwwkeys.pgp.net or any PGP keyserver for public key.
pgp2HUDc2O4mr.pgp
Description: PGP signature
- How competent are the PSPP developers?,
John Darrington <=