pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are these bugs in cluster?


From: Ben Pfaff
Subject: Re: Are these bugs in cluster?
Date: Sat, 30 May 2015 16:24:47 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

On Sat, May 30, 2015 at 09:13:16AM +0200, John Darrington wrote:
> On Fri, May 29, 2015 at 09:33:30AM -0500, Alan Mead wrote:
>      John suggested that I post to pspp-dev.  I'm adding code to the k-means
>      (i.e., quick-cluster.c) procedure to show cluster membership.
>      
>      CLUSTER works perfectly on a trivial two-dimensional problem but it
>      fails miserably on some real data. For example, in one analysis
>      requesting 3 clusters on 98 cases, it found that everyone was in cluster
>      3 and zero people were in clusters 1 & 2.  I think part of it is that
>      the starting values seem to be a pattern of 1's and zero's, even though
>      the comments describe selecting random individuals as starting values.
>      
>      My question is about accessing the data.  I copied other code to use a
>      "casereader" to iterate over the rows of data. Below are the relevant
>      parts of the code I've added that seems to display cluster membership.
>      If I want to randomly select cases as starting values, is there a way to
>      retrieve random records directly?
>      
> 
> Ben is the casereader expert!  Maybe he can comment?  But I think you might 
> be able to use the function casereader_select (defined in casereader-select.c)
> 
> casereader_select (subreader, random_number - 1, random_number + 1, 1);
> 
> You would have to ensure that random_number was within the range of subreader.

That seems reasonable to me.

If clustering actually wants a shuffled version of the complete data set
(I don't know if that is true?) then probably more efficient algorithms
are available.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]