Bug 116461

Summary: getdata borks when the first field in the format file isn't (ASCII) alphabetically the first field.
Product: [Applications] kst Reporter: Matthew Truch <matt>
Component: generalAssignee: kst
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 1.x   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Patch to fix the above

Description Matthew Truch 2005-11-16 00:55:25 UTC
Version:           1.2.0_svn_480257 (using KDE KDE 3.4.2)
OS:                Linux

Currently a dirfiles 'length' (number of frames) is defined by the first (valid) field in the format file.  Traditionally (on BLAST at least) this field is called FORMAT and (most) other fields are in lowercase.  The fact that the number of frames is defined by the first field is important so that data writing programs can write the first field last so that the race condition of not having some of the data written yet bug doesn't come to light.  However, getdata sorts the raw field list (with qsort), so the first field in the format file isn't neccesarily the first field in the list.  This can cause kst to either report the wrong number of frames available (if the number of samples per frame and/or size of the type differ) or the  race condition mentioned above.
Comment 1 Matthew Truch 2005-11-16 00:57:09 UTC
Created attachment 13486 [details]
Patch to fix the above

A patch by Don Wiebe to fix the above issue.  I won't commit it until people
have a look at to to make sure this is kosher.	

Note, the patch will apply, even though there have been some offsets introduced
since the code we generated the fix from was made.  The patch is relative to
the datasources/dirfile/ directory.
Comment 2 D. V. Wiebe 2005-11-16 01:16:37 UTC
(For "FORMAT" read "FASTSAMP" in Matt's report.)
Comment 3 D. V. Wiebe 2005-11-16 01:33:14 UTC
I should point out that the supplied patch completely deprecates the third argument to GetNFrames (in_field) and will always look at the first valid raw field in the format file, whatever it is passed for that parameter.

This is the behaviour Barth and I settled on when we hammered out getting defile (the BLAST dirfile writer) and kst to behave well together, but I don't know if this convention is universal.  AFAICT, kst never passes a non-NULL value for in_field when calling GetNFrames, so it shouldn't care.
Comment 4 Ted Kisner 2005-11-16 02:05:37 UTC
> 01:33 ------- I should point out that the supplied patch completely
> deprecates the third argument to GetNFrames (in_field) and will always look
> at the first valid raw field in the format file, whatever it is passed for
> that parameter.


In my dirfile library here:

http://cmb.phys.cwru.edu/kisner/code/dirfile/

I always return the number of available samples for the desired field.  This 
code is based on Barth's original getData, with lots of memory leak plugging 
and full "putData" functionality implemented by me.  Frankly I think it is 
silly to assume that all fields will have the same number of samples on disk.

> This is the behaviour Barth and I settled on when we hammered out getting
> defile (the BLAST dirfile writer) and kst to behave well together,


I still don't understand why you guys didn't just use my putData stuff for 
writing dirfiles- it does all of the reverse mapping from linterp/lincom/bit 
fields to the underlying RAW field.  But whatever- I know from personal 
experience that sometimes everyone needs to reinvent a wheel or 3 ;-)

> but I 
> don't know if this convention is universal.  AFAICT, kst never passes a
> non-NULL value for in_field when calling GetNFrames, so it shouldn't care.


I certainly have used kst to plot dirfile directories where the fields in the 
format file all have different sizes.  Is this situation formally 
unsupported?  In my opinion, that removes much of the usefullness of the 
dirfile format.

-Ted
Comment 5 George Staikos 2005-11-16 02:17:00 UTC
On Tuesday 15 November 2005 20:05, Ted Kisner wrote:
> > 01:33 ------- I should point out that the supplied patch completely
> > deprecates the third argument to GetNFrames (in_field) and will always
> > look at the first valid raw field in the format file, whatever it is
> > passed for that parameter.
>
> In my dirfile library here:
>
> http://cmb.phys.cwru.edu/kisner/code/dirfile/
>
> I always return the number of available samples for the desired field. 
> This code is based on Barth's original getData, with lots of memory leak
> plugging and full "putData" functionality implemented by me.  Frankly I
> think it is silly to assume that all fields will have the same number of
> samples on disk.
>
> > This is the behaviour Barth and I settled on when we hammered out getting
> > defile (the BLAST dirfile writer) and kst to behave well together,
>
> I still don't understand why you guys didn't just use my putData stuff for
> writing dirfiles- it does all of the reverse mapping from
> linterp/lincom/bit fields to the underlying RAW field.  But whatever- I
> know from personal experience that sometimes everyone needs to reinvent a
> wheel or 3 ;-)


  If you have some time to hook this into a datasource we could definitely 
give it a try too.  The current one is just Kst legacy... it was there and 
just got poked around as needed.
Comment 6 Ted Kisner 2005-11-16 02:41:03 UTC
On Tuesday 15 November 2005 17:17, George Staikos wrote:
> 
Comment 7 Matthew Truch 2005-11-16 02:46:48 UTC
On Tue, Nov 15, 2005 at 05:05:11PM -0800, Ted Kisner wrote:
> I still don't understand why you guys didn't just use my putData stuff for 
> writing dirfiles- it does all of the reverse mapping from linterp/lincom/bit 
> fields to the underlying RAW field.  But whatever- I know from personal 
> experience that sometimes everyone needs to reinvent a wheel or 3 ;-)


BLAST's dirfile writer (what became 'defile') was written when dirfiles
were invented, so it's pretty old.  

> > but I 
> > don't know if this convention is universal.  AFAICT, kst never passes a
> > non-NULL value for in_field when calling GetNFrames, so it shouldn't care.
> 
> I certainly have used kst to plot dirfile directories where the fields in the 
> format file all have different sizes.  Is this situation formally 
> unsupported?  In my opinion, that removes much of the usefullness of the 
> dirfile format.


AFAIK (someone please correct me if I'm wrong), kst assumes that the
dirfile has a uniform number of frames, and if kst tries to read more
data than a raw field has it enters a race (and consumes all your cpu).  
Comment 8 D. V. Wiebe 2005-11-16 02:57:27 UTC
> I still don't understand why you guys didn't just use my putData stuff for
> writing dirfiles- it does all of the reverse mapping from linterp/lincom/bit
> fields to the underlying RAW field.  But whatever- I know from personal
> experience that sometimes everyone needs to reinvent a wheel or 3 ;-)

I had no desire to reinvent anything when I wrote defile.  We hacked the dirfile writer that Barth wrote when he first wrote getdata out of the BLAST flight code because we didn't want it there anymore and repackaged it as a groundstation app.  Had I known putData existed at the time, I might have looked into it, but I didn't.

That aside, speaking as a getdata user, if these bugs we're encountering are already fixed, it would be really nice if they could be incorporated into the getdata trunk so we don't have to waste time re-fixing them.
Comment 9 George Staikos 2005-11-16 03:06:27 UTC
I know they're not all your fault, but these sprintf() are dangerous.  They 
should use snprintf instead.  There is a security impact in the sense that we 
hit files on remote sites with KIO so there is a possibility for someone to 
cause an overflow.  I would approve the patch with those fixed.
Comment 10 Ted Kisner 2005-11-16 03:26:17 UTC
On Tuesday 15 November 2005 17:57, D.V.Wiebe wrote:
> I had no desire to reinvent anything when I wrote defile. 
Comment 11 Matthew Truch 2005-11-16 03:37:52 UTC
SVN commit 480677 by truch:

Apply the patch to fix the 'getdata borked' bug for now.  

BUG: 116461


 M  +19 -23    getdata.c  
 M  +1 -0      getdata_struct.h  


--- trunk/extragear/graphics/kst/kst/datasources/dirfile/getdata.c #480676:480677
@@ -287,13 +287,15 @@
 /*                                                                         */
 /***************************************************************************/
 struct FormatType *GetFormat(const char *filedir, int *error_code) {
-  int i_format;
+  int i_format, i;
+  struct stat statbuf;
   FILE *fp;
   char format_file[MAX_FILENAME_LENGTH+6];
   char instring[MAX_LINE_LENGTH];
   struct FormatType *F;
   char in_cols[MAX_IN_COLS][MAX_LINE_LENGTH];
   int n_cols;
+  char raw_data_filename[MAX_FILENAME_LENGTH+FIELD_LENGTH+2];
 
   /** first check to see if we have already read it **/
   for (i_format=0; i_format<Formats.n; i_format++) {
@@ -312,7 +314,7 @@
   F = Formats.F+Formats.n-1;
 
   /***** open the format file (if there is one) ******/
-  sprintf(format_file,"%s/format", filedir);
+  snprintf(format_file, MAX_FILENAME_LENGTH+6, "%s/format", filedir);
   fp = fopen(format_file, "r");
   if (fp == NULL) {
     *error_code = GD_E_OPEN_FORMAT;
@@ -406,6 +408,15 @@
   }
   /** Now sort the lists */
   if (F->n_raw > 1) {
+    for (i=0; i<F->n_raw; i++) {
+      snprintf(raw_data_filename, MAX_FILENAME_LENGTH+FIELD_LENGTH+2, 
+          "%s/%s", filedir, F->rawEntries[i].field);
+      if (stat(raw_data_filename, &statbuf) >=0) {
+        F->first_field = F->rawEntries[i];
+        break;
+      }
+    }
+
     qsort(F->rawEntries, F->n_raw, sizeof(struct RawEntryType),
         RawCmp);
   }
@@ -891,7 +902,8 @@
 
   /** open the file (and cache the fp) if it hasn't been opened yet. */
   if (R->fp <0) {
-    sprintf(datafilename, "%s/%s", F->FileDirName, field_code);
+    snprintf(datafilename, MAX_FILENAME_LENGTH+FIELD_LENGTH + 1, 
+        "%s/%s", F->FileDirName, field_code);
     R->fp = open(datafilename, O_RDONLY);
     if (R->fp<0) {
       *n_read = 0;
@@ -1682,9 +1694,6 @@
   char raw_data_filename[MAX_FILENAME_LENGTH+FIELD_LENGTH+2];
   struct stat statbuf;
   int nf;
-  char field_buf[80];
-  const char *field;
-  int i;
 
   *error_code = GD_E_OK;
 
@@ -1706,28 +1715,15 @@
     return(0);
   }
 
-  if (in_field == NULL) {
-    /* check for first valid raw field */
-    for (i=0; i<F->n_raw; i++) {
-      sprintf(raw_data_filename,"%s/%s", filename, F->rawEntries[i].field);
-      if (stat(raw_data_filename, &statbuf) >=0) {
-        strncpy(field_buf, F->rawEntries[i].field, 79);
-        i = F->n_raw;
-      }
-    }
-    field = field_buf;
-  } else {
-    field = in_field;
-  }
-
-  sprintf(raw_data_filename,"%s/%s", filename, field);
+  /* load the first valid raw field */
+  snprintf(raw_data_filename, MAX_FILENAME_LENGTH+FIELD_LENGTH+2, 
+      "%s/%s", filename, F->first_field.field);
   if (stat(raw_data_filename, &statbuf) < 0) {
     return(0);
   }
 
   nf = statbuf.st_size/
-    (F->rawEntries[0].size*F->rawEntries[0].samples_per_frame);
-
+    (F->first_field.size*F->first_field.samples_per_frame);
   nf += F->frame_offset;
 
   nf -= 2;
--- trunk/extragear/graphics/kst/kst/datasources/dirfile/getdata_struct.h #480676:480677
@@ -69,6 +69,7 @@
 struct FormatType {
   char FileDirName[MAX_FILENAME_LENGTH];
   int frame_offset;
+  struct RawEntryType first_field;
   struct RawEntryType *rawEntries;
   int n_raw;
   struct LincomEntryType *lincomEntries;