pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: K-Means Clustering


From: John Darrington
Subject: Re: K-Means Clustering
Date: Sat, 12 Mar 2011 05:55:52 +0000
User-agent: Mutt/1.5.18 (2008-05-17)

In pspp the data is accesed through things called casereaders.  There are some 
restrictions on what you can do with a casereader - these restrictions are 
intentional and make it possible to deal with very large datasets.

You can iterate and print out the data with a code fragment like this:

  struct ccase *c;
  bool ok;

  struct casereader *input = proc_open (ds);

  for (; (c = casereader_read (input)) != NULL; case_unref (c))
    {
      int v;
      for (v = 0; v < n; ++v)
        {
          double x = case_data (c, variables[v])->f;
          printf ("%g\t", x);
        }
      printf ("\n");
    }

  ok = casereader_destroy (input);
  ok = proc_commit (ds) && ok;

  return ok;


I haven't actually tested the above code, so you may have to debug it!

Hope this helps.

J'

On Fri, Mar 11, 2011 at 12:28:09AM -0800, Mehmet Hakan Satman wrote:
     Hi friends,
     
     i am trying to implement such a function
     
     int
     cmd_quick_cluster (struct lexer *lexer, struct dataset *ds)
     {
        const struct dictionary *dict = dataset_dict (ds);
        struct variable *v = dict_get_weight (dict);
        struct variable **variables;
        int n;
        lex_match (lexer, T_SLASH);
        if (!lex_force_match_id (lexer, "VARIABLES")) printf("Variables must be 
     defined");
        lex_match(lexer, T_EQUALS);
        if (!parse_variables_const (lexer, dict, &variables, &n,PV_NUMERIC)) 
     printf("Cannot parse variables");
        printf("Number of variables :%d\n",n);
        return(CMD_SUCCESS);
     }
     
     for the K-means clustering. probably, at least the \VARIABLES and the 
\GROUPS 
     parameters must be implemented in the QUICK CLUSTER command. I learnt how 
to 
     parse command line, as you see, i can hold the number of variables and the 
     selected variables in a doubled (struct variable**) variable.
     
     Because of the nearly perfect abstraction, i can't reach the data itself. 
I need 
     to handle the data as doubles. What is the easiest way to grab the raw 
values 
     from datasets?
     
           
-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]