[gnuastro-commits] master c75c40e: Library (txt.h & fits.h): Properly ac

gnuastro-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master c75c40e: Library (txt.h & fits.h): Properly ac

From:	Mohammad Akhlaghi
Subject:	[gnuastro-commits] master c75c40e: Library (txt.h & fits.h): Properly accounting for blank strings
Date:	Tue, 4 Aug 2020 20:21:21 -0400 (EDT)
branch: master
commit c75c40e9c1e357de42199d1d35ec5ad6ff8a34b9
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (txt.h & fits.h): Properly accounting for blank strings
    
    Until now, the blank value specified for plain-text numeric columns had to
    be a number in the same type. This was inconvenient because we would have
    to specify unique numbers based on the column type.
    
    With this commit, its now possible to specify strings as blank values of
    numeric columns. With this feature, we can now specify the same string for
    all numeric column types and just give them in the column metadata.
    
    While doing this, I also found a bug in writing FITS tables: until now for
    non-standard FITS types (int8_t, uint16_t, uint32_t and uint64_t), we would
    simply write the blank value in the TNULL keyword. However, according to
    the FITS standard, TNULL is checked before TZERO or TSCAL are applied to
    these non-standard data types. As a result, when reading from that FITS
    table, the blank elements aren't specified as blank any more! So this has
    also been corrected.
    
    This bug was reported by Samane Raji.
    
    This fixes bug #58901.
---
 NEWS                                |   7 ++
 doc/gnuastro.texi                   |   6 +-
 lib/fits.c                          |  34 +++++-
 lib/gnuastro-internal/tableintern.h |   8 ++
 lib/tableintern.c                   |  19 ++--
 lib/txt.c                           | 208 +++++++++++++++++++++---------------
 6 files changed, 189 insertions(+), 93 deletions(-)

diff --git a/NEWS b/NEWS
index 3725e03..83b49d5 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ See the end of the file for license conditions.
 
 ** New features
 
+  All programs:
+   - When reading plain-text tables, the blank value for numeric columns
+     can be any string (specified in the special comment-line format
+     described in the "Gnuastro text table format" of the manual). Until
+     now, it had to be a number in the same type.
+
   Arithmetic:
    - New operators:
      - interpolate-minngb: Fill blanks with minimum of nearest neighbors.
@@ -100,6 +106,7 @@ See the end of the file for license conditions.
   bug #58809: NoiseChisel not removing negative outlier tiles.
   bug #58833: Segment crashes when detetion map has blank pixels
   bug #58835: Floating point errors when comparing pixel scale in Crop.
+  bug #58901: Blank values for non-standard integer types in FITS tables.
 
 
 
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 7998442..9a694fd 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -7719,8 +7719,12 @@ A column information comment is assumed to have the 
following format:
 Any sequence of characters between `@key{:}' and `@key{[}' will be interpreted 
as the column name (so it can contain anything except the `@key{[}' character).
 Anything between the `@key{]}' and the end of the line is defined as a comment.
 Within the brackets, anything before the first `@key{,}' is the units 
(physical units, for example km/s, or erg/s), anything before the second 
`@key{,}' is the short type identifier (see below, and @ref{Numeric data 
types}).
+
 Finally (still within the brackets), any non-white characters after the second 
`@key{,}' are interpreted as the blank value for that column (see @ref{Blank 
pixels}).
-Note that blank values will be stored in the same type as the column, not as a 
string@footnote{For floating point types, the @code{nan}, or @code{inf} strings 
(both not case-sensitive) refer to IEEE NaN (not a number) and infinity values 
respectively and will be stored as a floating point, so they are acceptable.}.
+The blank value can either be in the same type as the column (for example 
@code{-99} for a signed integer column), or any string (for example @code{NaN} 
in that same column).
+In both cases, the values will be stored in memory as Gnuastro's fixed blank 
values for each type.
+For floating point types, Gnuastro's internal blank value is IEEE NaN 
(Not-a-Number).
+For signed integers, it is the smallest possible value and for unsigned 
integers its the largest possible value.
 
 When a formatting problem occurs (for example you have specified the wrong 
type code, see below), or the column was already given meta-data in a previous 
comment, or the column number is larger than the actual number of columns in 
the table (the non-commented or empty lines), then the comment information line 
will be ignored.
 
diff --git a/lib/fits.c b/lib/fits.c
index bc71103..7d34dc5 100644
--- a/lib/fits.c
+++ b/lib/fits.c
@@ -2692,7 +2692,18 @@ gal_fits_tab_info(char *filename, char *hdu, size_t 
*numcols,
                         "blank value cannot be deduced", filename, hdu,
                         keyname, index+1);
               else
-                gal_tableintern_read_blank(&allcols[index], value);
+                {
+                  /* Put in the blank value. */
+                  gal_tableintern_read_blank(&allcols[index], value);
+
+                  /* This flag is not relevant for FITS tables. */
+                  if(allcols[index].flag
+                     ==GAL_TABLEINTERN_FLAG_ARRAY_IS_BLANK_STRING)
+                    {
+                      allcols[index].flag=0;
+                      free(allcols[index].array);
+                    }
+                }
             }
         }
 
@@ -3131,14 +3142,33 @@ fits_write_tnull_tcomm(fitsfile *fptr, gal_data_t *col, 
int tableformat,
       break;
 
     case GAL_TABLE_FORMAT_BFITS:
+
       /* FITS binary tables don't accept NULL values for floating point or
-         string columns. For floating point is must be NaN and for strings
+         string columns. For floating point it must be NaN, and for strings
          it is a blank string. */
       if( col->type!=GAL_TYPE_FLOAT32
           && col->type!=GAL_TYPE_FLOAT64
           && col->type!=GAL_TYPE_STRING )
         {
+          /* The blank value must be the raw value within the FITS file
+             (before applying 'TZERO' OR 'TSCAL'). Therefore, because the
+             following integer types aren't native to the FITS standard, we
+             need to correct TNULL for them after applying TZERO. For
+             example for uin16_t, TZERO is 32768, so TNULL has to be 32767
+             (the maximum value of the signed integer with the same
+             width). In this way, adding TZERO to the TNULL, will make it
+             the actual NULL value we assume in Gnuastro for uint16_t (the
+             largest possible number). */
           blank=gal_blank_alloc_write(col->type);
+          switch(col->type)
+            {
+            case GAL_TYPE_INT8:   gal_type_min(GAL_TYPE_UINT8, blank); break;
+            case GAL_TYPE_UINT16: gal_type_max(GAL_TYPE_INT16, blank); break;
+            case GAL_TYPE_UINT32: gal_type_max(GAL_TYPE_INT32, blank); break;
+            case GAL_TYPE_UINT64: gal_type_max(GAL_TYPE_INT64, blank); break;
+            }
+
+          /* Prepare the name and write the keyword. */
           if( asprintf(&keyname, "TNULL%zu", colnum)<0 )
             error(EXIT_FAILURE, 0, "%s: asprintf allocation", __func__);
           fits_write_key(fptr, gal_fits_type_to_datatype(col->type),
diff --git a/lib/gnuastro-internal/tableintern.h 
b/lib/gnuastro-internal/tableintern.h
index 6a405f7..a7e6c5e 100644
--- a/lib/gnuastro-internal/tableintern.h
+++ b/lib/gnuastro-internal/tableintern.h
@@ -52,6 +52,14 @@ __BEGIN_C_DECLS  /* From C++ preparations */
 
 
 
+/* Flags for columns. */
+enum tableintern_flags
+{
+ GAL_TABLEINTERN_FLAG_INVALID,  /* Zero according to FITS standard. */
+ GAL_TABLEINTERN_FLAG_ARRAY_IS_BLANK_STRING,
+};
+
+
 
 
 /************************************************************************/
diff --git a/lib/tableintern.c b/lib/tableintern.c
index e0e10d0..493cd38 100644
--- a/lib/tableintern.c
+++ b/lib/tableintern.c
@@ -37,6 +37,7 @@ along with Gnuastro. If not, see 
<http://www.gnu.org/licenses/>.
 
 #include <gnuastro-internal/timing.h>
 #include <gnuastro-internal/checkset.h>
+#include <gnuastro-internal/tableintern.h>
 
 
 
@@ -405,9 +406,9 @@ gal_tableintern_col_print_info(gal_data_t *col, int 
tableformat,
 
 
 /* Use the input 'blank' string and the input column to put the blank value
-   in the column's array. This function should later be generalized into a
-   function to read a string into a given data type (see
-   'gal_data_string_to_array_elem'). It is only here temporarily. */
+   in the column's array. If the string cannot be interpretted as a blank
+   of that type, then store it in the 'mmapname' element of the data
+   structure. */
 void
 gal_tableintern_read_blank(gal_data_t *col, char *blank)
 {
@@ -423,10 +424,16 @@ gal_tableintern_read_blank(gal_data_t *col, char *blank)
 
   /* Read the blank value as the given type. If successful, then
      'gal_data_string_to_type' will return 0. In that case, we need to
-     initialize the necessary paramters to read this data structure
-     correctly. */
-  if( !gal_type_from_string((void **)(&col->array), blank, col->type) )
+     initialize the necessary parameters to read this data structure
+     correctly. If it isn't successful, then  */
+  if( gal_type_from_string((void **)(&col->array), blank, col->type) )
     {
+      col->flag=GAL_TABLEINTERN_FLAG_ARRAY_IS_BLANK_STRING;
+      gal_checkset_allocate_copy(blank, (char **)(&col->array));
+    }
+  else
+    {
+      col->flag=0;
       col->dsize=gal_pointer_allocate(GAL_TYPE_SIZE_T, 1, 0, __func__,
                                       "col->dsize");
       col->dsize[0]=col->ndim=col->size=1;
diff --git a/lib/txt.c b/lib/txt.c
index 88d03d0..0bfbcc0 100644
--- a/lib/txt.c
+++ b/lib/txt.c
@@ -286,7 +286,11 @@ txt_info_from_comment(char *in_line, gal_data_t **datall, 
char *comm_start,
 
 
       /* Write the blank value into the array. Note that this is not the
-         final column, we are just collecting information now. */
+         final column, we are just collecting information now. If the blank
+         value wasn't interpretted into the given type the 'flag' element
+         of the dataset will be set and the contents of the 'blank' string
+         will be copied into the 'array' element (so it should be
+         interpretted as a 'char *'). */
       gal_tableintern_read_blank(*datall, gal_txt_trim_space(blank));
     }
 
@@ -496,7 +500,7 @@ txt_infoll_to_array(gal_data_t *datall, size_t *numdata)
   /* First find the total number of columns. */
   for(data=datall; data!=NULL; data=data->next) ++numc;
 
-  /* Conversion to an arry is only necessary when there is more than one
+  /* Conversion to an array is only necessary when there is more than one
      element in the list. */
   if(numc>1)
     {
@@ -517,6 +521,7 @@ txt_infoll_to_array(gal_data_t *datall, size_t *numdata)
              of the array. About the pointers, instead of having to
              allocate them again, we will just set them to NULL so
              'gal_data_free' doesn't remove them.*/
+          dataarr[ind].flag       = data->flag;    data->flag=0;
           dataarr[ind].name       = data->name;    data->name=NULL;
           dataarr[ind].unit       = data->unit;    data->unit=NULL;
           dataarr[ind].array      = data->array;   data->array=NULL;
@@ -547,7 +552,8 @@ txt_infoll_to_array(gal_data_t *datall, size_t *numdata)
 
 static void
 txt_get_info_line(char *line, gal_data_t **datall, char *comm_start,
-                  int *firstlinedone, int format, size_t *dsize, int inplace)
+                  int *firstlinedone, int format, size_t *dsize,
+                  int inplace)
 {
   size_t numtokens;
 
@@ -727,94 +733,127 @@ txt_read_token(gal_data_t *data, gal_data_t *info, char 
*token,
   float       *f = data->array,    *fb;
   double      *d = data->array,    *db;
 
-  /* Read the proper token into the column. */
-  switch(data->type)
+  /* See if this token is blank. */
+  int isblankstr = ( info->flag==GAL_TABLEINTERN_FLAG_ARRAY_IS_BLANK_STRING
+                     ? ( strcmp(info->array,token)==0 ? 1 : 0 )
+                     : 0);
+
+  /* If the string is equal to the given blank string, then just write
+     blank and don't bother parsing the token. */
+  if(isblankstr)
     {
-    case GAL_TYPE_STRING:
-      gal_checkset_allocate_copy(gal_txt_trim_space(token), &str[i]);
-      if( (strb=info->array) && !strcmp( *strb, str[i] ) )
+      switch(data->type)
         {
+        case GAL_TYPE_STRING:
           free(str[i]);
-          gal_checkset_allocate_copy(GAL_BLANK_STRING, &str[i]);
+          gal_checkset_allocate_copy(GAL_BLANK_STRING, &str[i]); break;
+        case GAL_TYPE_UINT8:  uc[i] = GAL_BLANK_UINT8;   break;
+        case GAL_TYPE_INT8:    c[i] = GAL_BLANK_INT8;    break;
+        case GAL_TYPE_UINT16: us[i] = GAL_BLANK_UINT16;  break;
+        case GAL_TYPE_INT16:   s[i] = GAL_BLANK_INT16;   break;
+        case GAL_TYPE_UINT32: ui[i] = GAL_BLANK_UINT32;  break;
+        case GAL_TYPE_INT32:  ii[i] = GAL_BLANK_INT32;   break;
+        case GAL_TYPE_UINT64: ul[i] = GAL_BLANK_UINT64;  break;
+        case GAL_TYPE_INT64:   l[i] = GAL_BLANK_INT64;   break;
+        case GAL_TYPE_FLOAT32: f[i] = GAL_BLANK_FLOAT32; break;
+        case GAL_TYPE_FLOAT64: d[i] = GAL_BLANK_FLOAT64; break;
+        default:
+          error(EXIT_FAILURE, 0, "%s: type code %d not recognized in "
+                "'blankstr' switch", __func__, data->type);
         }
-      break;
+    }
 
-    case GAL_TYPE_UINT8:
-      uc[i]=strtol(token, &tailptr, 0);
-      if( (ucb=info->array) && *ucb==uc[i] )
-        uc[i]=GAL_BLANK_UINT8;
-      break;
+  /* Parse the token into the column's dataset. */
+  else
+    {
+      switch(data->type)
+        {
+        case GAL_TYPE_STRING:
+          gal_checkset_allocate_copy(gal_txt_trim_space(token), &str[i]);
+          if( (strb=info->array) && !strcmp( *strb, str[i] ) )
+            {
+              free(str[i]);
+              gal_checkset_allocate_copy(GAL_BLANK_STRING, &str[i]);
+            }
+          break;
 
-    case GAL_TYPE_INT8:
-      c[i]=strtol(token, &tailptr, 0);
-      if( (cb=info->array) && *cb==c[i] )
-        c[i]=GAL_BLANK_INT8;
-      break;
+        case GAL_TYPE_UINT8:
+          uc[i]=strtol(token, &tailptr, 0);
+          if( (ucb=info->array) && *ucb==uc[i] )
+            uc[i]=GAL_BLANK_UINT8;
+          break;
 
-    case GAL_TYPE_UINT16:
-      us[i]=strtol(token, &tailptr, 0);
-      if( (usb=info->array) && *usb==us[i] )
-        us[i]=GAL_BLANK_UINT16;
-      break;
+        case GAL_TYPE_INT8:
+          c[i]=strtol(token, &tailptr, 0);
+          if( (cb=info->array) && *cb==c[i] )
+            c[i]=GAL_BLANK_INT8;
+          break;
 
-    case GAL_TYPE_INT16:
-      s[i]=strtol(token, &tailptr, 0);
-      if( (sb=info->array) && *sb==s[i] )
-        s[i]=GAL_BLANK_INT16;
-      break;
+        case GAL_TYPE_UINT16:
+          us[i]=strtol(token, &tailptr, 0);
+          if( (usb=info->array) && *usb==us[i] )
+            us[i]=GAL_BLANK_UINT16;
+          break;
 
-    case GAL_TYPE_UINT32:
-      ui[i]=strtol(token, &tailptr, 0);
-      if( (uib=info->array) && *uib==ui[i] )
-        ui[i]=GAL_BLANK_UINT32;
-      break;
+        case GAL_TYPE_INT16:
+          s[i]=strtol(token, &tailptr, 0);
+          if( (sb=info->array) && *sb==s[i] )
+            s[i]=GAL_BLANK_INT16;
+          break;
 
-    case GAL_TYPE_INT32:
-      ii[i]=strtol(token, &tailptr, 0);
-      if( (ib=info->array) && *ib==ii[i] )
-        ii[i]=GAL_BLANK_INT32;
-      break;
+        case GAL_TYPE_UINT32:
+          ui[i]=strtol(token, &tailptr, 0);
+          if( (uib=info->array) && *uib==ui[i] )
+            ui[i]=GAL_BLANK_UINT32;
+          break;
 
-    case GAL_TYPE_UINT64:
-      ul[i]=strtoul(token, &tailptr, 0);
-      if( (ulb=info->array) && *ulb==ul[i] )
-        ul[i]=GAL_BLANK_UINT64;
-      break;
+        case GAL_TYPE_INT32:
+          ii[i]=strtol(token, &tailptr, 0);
+          if( (ib=info->array) && *ib==ii[i] )
+            ii[i]=GAL_BLANK_INT32;
+          break;
 
-    case GAL_TYPE_INT64:
-      l[i]=strtol(token, &tailptr, 0);
-      if( (lb=info->array) && *lb==l[i] )
-        l[i]=GAL_BLANK_INT64;
-      break;
+        case GAL_TYPE_UINT64:
+          ul[i]=strtoul(token, &tailptr, 0);
+          if( (ulb=info->array) && *ulb==ul[i] )
+            ul[i]=GAL_BLANK_UINT64;
+          break;
 
-      /* For the blank value of floating point types, we need to make
-         sure it isn't a NaN, because a NaN value will fail on any
-         condition check (even '=='). If it isn't NaN, then we can
-         compare the values. */
-    case GAL_TYPE_FLOAT32:
-      f[i]=strtod(token, &tailptr);
-      if( (fb=info->array)
-          && ( (isnan(*fb) && isnan(f[i])) || *fb==f[i] ) )
-        f[i]=GAL_BLANK_FLOAT64;
-      break;
+        case GAL_TYPE_INT64:
+          l[i]=strtol(token, &tailptr, 0);
+          if( (lb=info->array) && *lb==l[i] )
+            l[i]=GAL_BLANK_INT64;
+          break;
 
-    case GAL_TYPE_FLOAT64:
-      d[i]=strtod(token, &tailptr);
-      if( (db=info->array)
-          && ( (isnan(*db) && isnan(d[i])) || *db==d[i] ) )
-        d[i]=GAL_BLANK_FLOAT64;
-      break;
+          /* For the blank value of floating point types, we need to make
+             sure it isn't a NaN, because a NaN value will fail on any
+             condition check (even '=='). If it isn't NaN, then we can
+             compare the values. */
+        case GAL_TYPE_FLOAT32:
+          f[i]=strtod(token, &tailptr);
+          if( (fb=info->array)
+              && ( (isnan(*fb) && isnan(f[i])) || *fb==f[i] ) )
+            f[i]=GAL_BLANK_FLOAT64;
+          break;
 
-    default:
-      error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
-            __func__, data->type);
-    }
+        case GAL_TYPE_FLOAT64:
+          d[i]=strtod(token, &tailptr);
+          if( (db=info->array)
+              && ( (isnan(*db) && isnan(d[i])) || *db==d[i] ) )
+            d[i]=GAL_BLANK_FLOAT64;
+          break;
 
-  /* If a number couldn't be read properly, then report an error. */
-  if(data->type!=GAL_TYPE_STRING && *tailptr!='\0')
-    error_at_line(EXIT_FAILURE, 0, filename, lineno, "column %zu "
-                  "('%s') couldn't be read as a '%s' number",
-                  colnum, token, gal_type_name(data->type, 1) );
+        default:
+          error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
+                __func__, data->type);
+        }
+
+      /* If a number couldn't be read properly, then report an error. */
+      if(data->type!=GAL_TYPE_STRING && *tailptr!='\0')
+        error_at_line(EXIT_FAILURE, 0, filename, lineno, "column %zu "
+                      "('%s') couldn't be read as a '%s' number",
+                      colnum, token, gal_type_name(data->type, 1) );
+    }
 }
 
 
@@ -822,9 +861,9 @@ txt_read_token(gal_data_t *data, gal_data_t *info, char 
*token,
 
 
 static void
-txt_fill(char *in_line, char **tokens, size_t maxcolnum, gal_data_t *info,
-         gal_data_t *out, size_t rowind, char *filename, size_t lineno,
-         int inplace, int format)
+txt_fill(char *in_line, char **tokens, size_t maxcolnum,
+         gal_data_t *colinfo, gal_data_t *out, size_t rowind,
+         char *filename, size_t lineno, int inplace, int format)
 {
   size_t i, n=0;
   gal_data_t *data;
@@ -857,7 +896,7 @@ txt_fill(char *in_line, char **tokens, size_t maxcolnum, 
gal_data_t *info,
          explanations in 'txt_info_from_first_row'. Note that an image has
          a single 'info' element for the whole array, while a table has one
          for each column. */
-      if( info[format==TXT_FORMAT_TABLE ? n-1 : 0].type == GAL_TYPE_STRING )
+      if( colinfo[format==TXT_FORMAT_TABLE ? n-1 : 0].type == GAL_TYPE_STRING )
         {
           /* Remove any delimiters and stop at the first non-delimiter. If
              we have reached the end of the line then its an error, because
@@ -867,7 +906,7 @@ txt_fill(char *in_line, char **tokens, size_t maxcolnum, 
gal_data_t *info,
 
           /* Everything is good, set the pointer and increment the line to
              the end of the allocated space for this string. */
-          line = (tokens[n]=line) + info[n-1].disp_width;
+          line = (tokens[n]=line) + colinfo[n-1].disp_width;
           if(line<end) *line++='\0';
         }
       else
@@ -894,19 +933,20 @@ txt_fill(char *in_line, char **tokens, size_t maxcolnum, 
gal_data_t *info,
 
   /* Read the desired tokens into the columns that need them. Note that
      when a blank value is defined for the column, the column's array
-     pointer ('info[col->status-1]') is not NULL and points to the blank
-     value. For strings, this will actually be a string. */
+     pointer ('colinfo[col->status-1]') is not NULL and points to the blank
+     value. For strings (or when the blank value is actually a string),
+     this will actually be a string. */
   switch(out->ndim)
     {
     case 1:
       for(data=out; data!=NULL; data=data->next)
-        txt_read_token(data, &info[data->status-1], tokens[data->status],
+        txt_read_token(data, &colinfo[data->status-1], tokens[data->status],
                        rowind, filename, lineno, data->status);
       break;
 
     case 2:
       for(i=0;i<out->dsize[1];++i)
-        txt_read_token(out, info, tokens[i+1], rowind * out->dsize[1] + i,
+        txt_read_token(out, colinfo, tokens[i+1], rowind * out->dsize[1] + i,
                        filename, lineno, i+1);
       break;
[Prev in Thread]
Current Thread
[Next in Thread]
[gnuastro-commits] master c75c40e: Library (txt.h & fits.h): Properly accounting for blank strings, Mohammad Akhlaghi <=
Prev by Date: [gnuastro-commits] master b4344fc 2/2: Book: general tutorial discusses merging three catalogs also
Next by Date: [gnuastro-commits] master d49d978: Library (txt.h): string columns won't force extra \0 in line
Previous by thread: [gnuastro-commits] master updated (5494c91 -> b4344fc)
Next by thread: [gnuastro-commits] master d49d978: Library (txt.h): string columns won't force extra \0 in line
Index(es):
- Date
- Thread