pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fix for reading funny compressed data, for review


From: Ben Pfaff
Subject: fix for reading funny compressed data, for review
Date: Wed, 14 Oct 2009 21:44:52 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

I'd like to push this to the stable branch.  Comments
appreciated.

commit e624e2da6ea68d22e6d4fba4eaa96d37d07a6730
Author: Ben Pfaff <address@hidden>
Date:   Wed Oct 14 21:20:44 2009 -0700

    sys-file-reader: Tolerate nonsensical opcodes in compressed data.
    
    Compressed data in .sav files uses a set of 256 opcodes, some of which make
    sense only for numeric data and others of which only make sense for string
    data.  However, Jereme Thomas <address@hidden> has provided one
    file, written by SPSS 14, that uses an opcode that seems to makes sense
    only for numeric data in a string field.  So this commit adds support for
    these opcodes, although it still warns about the ones other than the exact
    one found in the file provided by Jereme.

diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi
index 70fa385..b1be385 100644
--- a/doc/dev/system-file-format.texi
+++ b/doc/dev/system-file-format.texi
@@ -884,6 +884,9 @@ value @var{code} - @var{bias}, where
 variable @code{bias} from the file header.  For example,
 code 105 with bias 100.0 (the normal value) indicates a numeric variable
 of value 5.
+One file has been seen written by SPSS 14 that contained such a code
+in a @emph{string} field with the value 0 (after the bias is
+subtracted) as a way of encoding null bytes.
 
 @item 252
 End of file.  This code may or may not appear at the end of the data
diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c
index fe7b533..8d973e4 100644
--- a/src/data/sys-file-reader.c
+++ b/src/data/sys-file-reader.c
@@ -86,6 +86,7 @@ struct sfm_reader
     double bias;               /* Compression bias, usually 100.0. */
     uint8_t opcodes[8];         /* Current block of opcodes. */
     size_t opcode_idx;          /* Next opcode to interpret, 8 if none left. */
+    bool corruption_warning;    /* Warned about possible corruption? */
   };
 
 static const struct casereader_class sys_file_casereader_class;
@@ -192,6 +193,7 @@ sfm_open_reader (struct file_handle *fh, struct dictionary 
**dict,
   r->oct_cnt = 0;
   r->has_long_var_names = false;
   r->opcode_idx = sizeof r->opcodes;
+  r->corruption_warning = false;
 
   /* TRANSLATORS: this fragment will be interpolated into
      messages in fh_lock() that identify types of files. */
@@ -1374,7 +1376,14 @@ read_compressed_number (struct sfm_reader *r, double *d)
       break;
 
     case 254:
-      sys_error (r, _("Compressed data is corrupt."));
+      float_convert (r->float_format, "        ", FLOAT_NATIVE_DOUBLE, d);
+      if (!r->corruption_warning)
+        {
+          r->corruption_warning = true;
+          sys_warn (r, _("Possible compressed data corruption: "
+                         "compressed spaces appear in numeric field."));
+        }
+      break;
 
     case 255:
       *d = SYSMIS;
@@ -1395,7 +1404,8 @@ read_compressed_number (struct sfm_reader *r, double *d)
 static bool
 read_compressed_string (struct sfm_reader *r, char *dst)
 {
-  switch (read_opcode (r))
+  int opcode = read_opcode (r);
+  switch (opcode)
     {
     case -1:
     case 252:
@@ -1410,7 +1420,25 @@ read_compressed_string (struct sfm_reader *r, char *dst)
       break;
 
     default:
-      sys_error (r, _("Compressed data is corrupt."));
+      {
+        double value = opcode - r->bias;
+        float_convert (FLOAT_NATIVE_DOUBLE, &value, r->float_format, dst);
+        if (value == 0.0)
+          {
+            /* This has actually been seen "in the wild".  The submitter of the
+               file that showed that the contents decoded as spaces, but they
+               were at the end of the field so it's possible that the null
+               bytes just acted as null terminators. */
+          }
+        else if (!r->corruption_warning)
+          {
+            r->corruption_warning = true;
+            sys_warn (r, _("Possible compressed data corruption: "
+                           "string contains compressed integer (opcode %d)"),
+                      opcode);
+          }
+      }
+      break;
     }
 
   return true;

-- 
Peter Seebach on managing engineers:
"It's like herding cats, only most of the engineers are already
 sick of laser pointers."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]