pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: linked lists and trees


From: Ben Pfaff
Subject: Re: linked lists and trees
Date: Wed, 24 Sep 2008 09:31:02 -0700
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

Jason Stover <address@hidden> writes:

> I have a question about implementing linked lists or trees.  I'm not
> sure which to use. Also, I don't know if I should write my own or if there
> are any in the pspp source I can already use.

We have implementations of all of these in libpspp: linked lists
in ll.[ch] and llx.[ch] and trees in bt.[ch] and abt.[ch].  We
also have one form of hash tables in hash.[ch], and I'm actively
working on a better hash implementation, that I will probably put
out for review in a few days.

> Here are the details:
>
> I'm writing a one-pass function to compute the covariance matrix.  It
> needs to be able to handle categorical variables before the number of
> categories is known, and that isn't known until after the data pass.
>
> I have a solution to this problem. It involves using a "simple"
> temporary data structure to keep the values for the covariance matrix
> until the data pass is complete, then it will translate the values and
> put them in the proper matrix data structure.
>
> I'm not sure how to make this "simple" temporary data structure.
> I started out making a linked list, then thought maybe I should use
> a tree. 

The key questions you should be asking are:

        - How much data can I end up with?  If it is potentially
          more than can fit in memory, maybe you shouldn't insist
          on doing it in one pass.  (But if the operations to be
          performed on it essentially require random access, then
          it's still reasonable to do it all in-memory in a
          single pass.)

(OK, suppose you've concluded that you still want to do it
in-memory in one pass.  Then continue:)

        - What are the operations I want to perform on the data
          structure?  In particular, what do you want to look up
          a node based on?  Is the key a single value or a range?
          Does the data consist of single values or ranges?  Do
          you care whether the data is in sorted order for most
          operations?

My guess is that you want to look up nodes based on the exact
value of a key, and don't need the nodes to be in sorted order
for that.  Then, the proper data structure is a hash table.  A
tree would work too, but a hash table will be asymptotically
faster.

> One node in the tree would look like this:
>
> struct node
> {
>   struct variable *v1;
>   struct variable *v2;
>   double covariance_element;
>   double center;
>   union value *val1;
>   union value *val2;
>   size_t count;
>   struct node *right;
>   struct node *left;
> };

Which members are the key that you want to look up to find a
node?

> The reason I didn't want to use a tree is that I don't want to
> recursively search it. It could be quite large, and I remember
> overflowing the stack in the past by recursing many times (say, 5000)
> in a Lisp program. But I'm not writing in Lisp. So: how much stack
> space do I have to recurse?

You have lots of stack space.  But binary trees doesn't require
recursion (it's just an obvious implementation technique), and
the implementations in src/libpspp don't recurse at all.

> And is there already an implementation of trees or linked lists in, say,
> src/data?

src/libpspp
-- 
RMS on DRM: "This ought to be a crime. And, if we had governments of the
people, by the people, for the people, then the executives of those companies
would be in prison.  But they're not in prison, and the reason is that we have
government of the people, by the sell-outs, for the corporations."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]