[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: K-Means Clustering
From: |
John Darrington |
Subject: |
Re: K-Means Clustering |
Date: |
Tue, 15 Mar 2011 09:23:34 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Mon, Mar 14, 2011 at 12:22:36PM -0700, Mehmet Hakan Satman wrote:
Hi John,
1) I renamed the file as "quick-cluster.c"
2. I added an entry to? "src/language/stats/automake.mk" for quick-cluster
3. I removed the entry "UNIMPL_CMD ("QUICK CLUSTER", "Fast clustering")"
from command.def file.
Thanks. I tried some experiments with it. It looks promising. But there are
some improvements
which can be made.
4. Now cmd_quick_cluster can parse a command line like:
QUICK CLUSTER x y z
? ? ? /CRITERIA = CLUSTER(5) MXITER (100).
I inadvertently ran it with the wrong syntax (I typed just "QUICK CLUSTER."
without any variables),
and it caused PSPP to crash. You should check the return value of
parse_variables_const
and return an error if it fails. See the code for the other procedures to see
how to do this.
It also crashed if I omitted the /CRITERIA subcommand because your algorithm
expects
the number of groups is greater than 0. The spss documentation says that
the CLUSTER and MXITER parameters both default to 2. So you should initialise
them accordingly.
As
I mentioned, i test my results with random data with uniform
distributed random values. It can not be considered as a comprehensive
work and should be tested with simulations.
It's not my field of expertise, but I ran it with the following syntax:
input program.
loop #i = 1 to 100000.
compute x = rv.uniform (0, 1).
end case.
end loop.
end file.
end input program.
QUICK CLUSTER ALL
/CRITERIA = CLUSTER(3) MXITER (100).
and got :
Centers:
Center of Group 1: 0.499
Center of Group 2: 0.833
Center of Group 3: 0.165
which is close to what I would expect (the centres are 1/6, 3/6 and 5/6). Can
you
suggest some more comprehensive tests?
I have some general suggestions about the quick-cluster.c file:
1. The formatting style doesn't really fit the GNU way of doing things. I
recommend
that you run the command "indent --gnu-style quick-cluster.c" to make it
more consistent
with the rest of the code. You might want to read the information at
http://www.gnu.org/prep/standards/standards.html which explains how GNU
software is
normally written and why we do it that way.
2. When compiling, I get a number of warnings. Most of these are due to
missing "static"
qualifiers from the functions.
3. In PSPP we don't use the stdlib "malloc". Instead we use "xmalloc" from
gnulib.
4. Similarly, we don't use the srand and rand functions. Use the gsl_rng_*
functions.
These are supposed to be better random number generators.
See the file src/language/xforms/sample.c and/or the gsl manual for an
example.
I'm looking forward to seeing the QUICK CLUSTER command integrated into PSPP.
I tried
to find some examples of how spss presents its output for this command but I
couldn't
find any. Do you have any such examples or do you have access to a copy of
pspp so that
we can see how users might expect to see the results?
Regards,
John
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- Re: K-Means Clustering, (continued)
- Re: K-Means Clustering, Ben Pfaff, 2011/03/09
- Message not available
- Re: K-Means Clustering, John Darrington, 2011/03/10
- Re: K-Means Clustering, Mehmet Hakan Satman, 2011/03/10
- Re: K-Means Clustering, John Darrington, 2011/03/10
- Re: K-Means Clustering, Mehmet Hakan Satman, 2011/03/10
- Re: K-Means Clustering, Mehmet Hakan Satman, 2011/03/11
- Re: K-Means Clustering, John Darrington, 2011/03/12
Re: K-Means Clustering, John Darrington, 2011/03/13
Re: K-Means Clustering, Mehmet Hakan Satman, 2011/03/14
- Re: K-Means Clustering,
John Darrington <=
Re: Re: K-Means Clustering, Harry Thijssen, 2011/03/15
Re: Re: Re: K-Means Clustering, Harry Thijssen, 2011/03/15