pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

K-means cluster center order


From: Alan Mead
Subject: K-means cluster center order
Date: Sat, 30 May 2015 17:38:26 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

I've uploaded a patch (against quick-cluster.c in 0.8.4)  that adds
support for the /PRINT=CLUSTER subcommand for k-means clustering to show
the cluster membership for each case:

https://savannah.gnu.org/bugs/index.php?41019

But this patch has a remaining bug.  The clusters centers are saved in
some indirect fashion that I cannot understand. 

In the patch, I report the cluster number returned by
kmeans_get_nearest_group() but these cluster numbers are systematically
different from the reported cluster numbers.  That is, the centers are
stored internally in arbitrary order (as they are discovered, I'd guess)
and for purposes of reporting, they are numbered.  I cannot replicate
that output numbering.

For example, in the attached output, the centers were (10,10),
(-10,-10), and (-10,10) and 20 cases were generated for each cluster. 
The CLUSTER command reports 1 = (-10.23, -10.01), 2 = (-10.19, 10.18)
and 3=(10.27, 9.82) so the first 20 cases should be members of cluster
3, the next 20 from cluster 3 and the last 20 from cluster 2.  But using
the results from kmeans_get_nearest_group(), the clusters are reported
as 1, then 3, then 2.

I don't understand how I can fix this.  I think I need to use
kmeans->group_order which is a "gsl_permutation" but this is beyond my
familiarity with C and GSL.

It's also possible that kmeans_order_groups() (which is called at the
beginning of quick_cluster_show_results()) is not working properly.

Any advice?

-Alan

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

+815.588.3846 (Office)
+267.334.4143 (Mobile)

http://www.alanmead.org

Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat

Attachment: patch_for_cluster_print.patch
Description: Text document

Attachment: qc.pdf
Description: Adobe PDF document

Attachment: qc.sps
Description: application/spss-sps

Attachment: qc1.data
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]