pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in datasheet.c ?


From: Ben Pfaff
Subject: Re: Bug in datasheet.c ?
Date: Thu, 02 Aug 2007 17:15:12 -0700
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

John Darrington <address@hidden> writes:

> 4.   Type in a command, which does NOT cause data to be read (eg
>      DISPLAY DICTIONARY or ECHO 'hello' )
>
> 5.   Run that command (using the Run menu).
>
> Every time you Run it, the refresh gets slower and slower until after
> about 6 or 7 times, it's ground to a halt.  Doing something that loads
> fresh data fixes the problem --- it becomes nice and fast again.

The problem is this: on each trip through execute_syntax, the
active file's data is encapsulated in a new datasheet that gets
created by the call to psppire_case_file_new.  This is fine.  The
performance issue comes in this way: if the data was not read by
the syntax that was executed, however, the active file's data is
a casereader created by datasheet_make_reader.  Thus, the new
datasheet gets its data from a casereader that comes from the old
datasheet.  If you do this 6 or 7 times, then you end up with 6
or 7 layers that have to be traversed every single time you read
a value from the data sheet.

One "solution" is to insert the following in execute_syntax,
above the call to psppire_case_file_new:
    casereader_destroy (proc_open (the_dataset));
    proc_commit (the_dataset);
This will force the data to be put into a new data source that is
not backed by a datasheet.  It's not a very good solution,
though, since it adds an otherwise unhelpful pass through the
data.

A better solution would be to detect that the data set has not
been changed by the syntax that we executed.  In that case, we
don't need to re-encapsulate it in a new datasheet.  I'm
appending an untested patch against procedure.[ch] that would
allow us to detect that.  An alternative would be to use the
replace_data callback that is currently scheduled for removal in
patch #6075.

A tempting way to do it would be to make datasheet_create detect
that its argument is a casereader created from a datasheet and in
that case just "snap the link": return the underlying datasheet.
But that seems like a bad idea, since the casereader might have
clones, and if you modify the data in the underlying datasheet
you'd corrupt the clones' data from their point of view.  It also
would require a new interface that would allow a client to find
out how a casereader is implemented, and thus far I'm happy that
we haven't needed such an interface.

By the way, it seems that we could do
        proc_set_active_file_data (the_dataset, NULL);
toward the end of execute_syntax.  If we can, it would be a
good idea, because then the casereader underlying the
PsppireCaseFile will likely have a reference count of 1, which
makes it possible to free up its memory or disk space should
all of its cases or variables be deleted from the datasheet by
the user.

Index: merge/src/data/procedure.c
===================================================================
--- merge.orig/src/data/procedure.c     2007-08-02 15:45:20.000000000 -0700
+++ merge/src/data/procedure.c  2007-08-02 15:53:21.000000000 -0700
@@ -55,6 +55,9 @@ struct dataset {
   struct trns_chain *temporary_trns_chain;
   struct dictionary *dict;
 
+  /* Number of times the source has changed. */
+  unsigned int source_change_count;
+
   /* Callback which occurs when a procedure provides a new source for
      the dataset */
   replace_source_callback *replace_source ;
@@ -116,6 +119,15 @@ time_of_last_procedure (struct dataset *
     update_last_proc_invocation (ds);
   return ds->last_proc_invocation;
 }
+
+/* Returns the number of times that the data source has changed.
+   The return value can be used to detect whether the active file
+   has been read (or otherwise changed) between a pair of calls. */
+unsigned int
+proc_get_source_change_count (const struct dataset *ds) 
+{
+  return ds->source_change_count;
+}
 
 /* Regular procedure. */
 
@@ -340,6 +352,7 @@ proc_commit (struct dataset *ds)
     }
   ds->sink = NULL;
   if ( ds->replace_source) ds->replace_source (ds->source);
+  ds->source_change_count++;
 
   caseinit_clear (ds->caseinit);
   caseinit_mark_as_preinited (ds->caseinit, ds->dict);
@@ -588,6 +601,7 @@ proc_discard_active_file (struct dataset
   casereader_destroy (ds->source);
   ds->source = NULL;
   if ( ds->replace_source) ds->replace_source (NULL);
+  ds->source_change_count++;
 
   proc_cancel_all_transformations (ds);
 }
@@ -619,6 +633,7 @@ proc_set_active_file_data (struct datase
   casereader_destroy (ds->source);
   ds->source = reader;
   if (ds->replace_source) ds->replace_source (reader);
+  ds->source_change_count++;
 
   caseinit_clear (ds->caseinit);
   caseinit_mark_as_preinited (ds->caseinit, ds->dict);
Index: merge/src/data/procedure.h
===================================================================
--- merge.orig/src/data/procedure.h     2007-08-02 15:51:11.000000000 -0700
+++ merge/src/data/procedure.h  2007-08-02 15:54:05.000000000 -0700
@@ -69,6 +69,7 @@ void proc_discard_output (struct dataset
 
 bool proc_execute (struct dataset *ds);
 time_t time_of_last_procedure (struct dataset *ds);
+unsigned int proc_get_source_change_count (const struct dataset *);
 
 struct casereader *proc_open (struct dataset *);
 bool proc_is_open (const struct dataset *);
-- 
Ben Pfaff 
http://benpfaff.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]