Bug 120827 - Command line option -f fails for some ascii files
Summary: Command line option -f fails for some ascii files
Status: RESOLVED FIXED
Alias: None
Product: kst
Classification: Applications
Component: general (show other bugs)
Version: 1.x
Platform: unspecified Linux
: HI normal
Target Milestone: ---
Assignee: kst
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-26 18:54 UTC by Ted Kisner
Modified: 2006-02-15 15:18 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ted Kisner 2006-01-26 18:54:53 UTC
Version:           1.2.0_devel (using KDE 3.5.0, Kubuntu Package 4:3.5.0-0ubuntu0breezy1 breezy)
Compiler:          Target: x86_64-linux-gnu
OS:                Linux (x86_64) release 2.6.12-10-amd64-k8

When specifying the "-f" command line option, the ascii datasource scans for the number of columns *in the first row*, rather than starting at the specified frame.  For example, if I have a text file like this:

16385
0.000000e+00 6.785501e+05
1.831055e-03 5.238199e+05
3.662109e-03 4.617861e+05
5.493164e-03 4.578143e+05
7.324219e-03 4.263500e+05

and would like to plot the second column, I should be able to specify:

kst -f 1 -x 1 -y 2 <filename>

However, the number of available fields is determined by the first frame, not the frame specified by the "-f" option.  This is a regression (it used to work fine).  I think this feature is pretty important.

I'll try to dig in to this over the weekend, but maybe someone knows exactly where the problem is, due to recent changes in main.cpp, the datasource, etc?
Comment 1 George Staikos 2006-01-26 20:07:03 UTC
On Thursday 26 January 2006 12:54, Ted Kisner wrote:
> I'll try to dig in to this over the weekend, but maybe someone knows
> exactly where the problem is, due to recent changes in main.cpp, the
> datasource, etc? _______________________________________________


  I think it has to be in the datasource.
Comment 2 Nicolas Brisset 2006-01-27 09:08:40 UTC
I think if you open up kst, go to the settings menu, datasources tab and change the ASCII settings there to tell it to skip 1 line, save and exit, it should work from the commandline as well... 
Alternatively, you can modify the appropriate line in kstdatasourcerc with a script if you really need to automate this.
Comment 3 Ted Kisner 2006-01-27 17:42:21 UTC
On Friday 27 January 2006 00:08, Nicolas Brisset wrote
| ------- Additional Comments From nicolas.brisset eurocopter com  2006-01-27
| 09:08 ------- I think if you open up kst, go to the settings menu,
| datasources tab and change the ASCII settings there to tell it to skip 1
| line, save and exit, it should work from the commandline as well...
| Alternatively, you can modify the appropriate line in kstdatasourcerc with
| a script if you really need to automate this.

But the point is that it *used* to work, and this is an important feature from 
my perspective (and at least several of my colleagues).  I don't want to have 
to open kst to change a setting everytime I want to make a plot of a new file 
from the command line.  That kind of defeats the purpose of using the command 
line.

Do you have any ideas where this feature got broken in the datasource?  I'm 
busy today, but will look at it this weekend.  Fixing this is a high priority 
for me, since I'm in process of generating *thousands* of such files, and 
it's really annoying to have to do "tail -<lines-1> file > temp" every time I 
want to plot something.

-Ted
Comment 4 Ted Kisner 2006-01-27 19:01:48 UTC
On Friday 27 January 2006 09:06, Barth Netterfield wrote:
| BUT: the data source has to report back what fields are valid, without
| knowing what row you are starting to read from. 
Comment 5 Ted Kisner 2006-01-27 21:11:20 UTC
On Friday 27 January 2006 09:11, Brisset, Nicolas wrote:
| You only have to open kst if you need to change the settings, but as
| long as all the files you are dealing with have the same format (which
| should be the case at least 99.9% of the time), you can use the command
| line without any problem. 

Unless I'm missing something, this is simply not true.  kst saves the config 
*for a given file*.  Let's say I generate a thousand files with the same 
format.  Now I open one file in kst and go to the datasource and configure it 
how I want.  kst remembers this setting *for this file*.  If I open any of 
the other 999 files, this setting is not true.

| I can't imagine how this could have worked 
| differently since the introduction of ASCII configurations a few months
| back...

I haven't used kst for this kind of ascii data in a while, so I'm not sure 
when the feature broke.

-Ted
Comment 6 Ted Kisner 2006-01-27 21:11:24 UTC
On Friday 27 January 2006 11:17, Ted Kisner wrote:
| What do you all think? 
Comment 7 Ted Kisner 2006-01-27 21:11:27 UTC
On Friday 27 January 2006 11:17, Ted Kisner wrote:
| What do you all think? 
Comment 8 Ted Kisner 2006-01-27 21:11:31 UTC
On Friday 27 January 2006 09:11, Brisset, Nicolas wrote:
| I can't imagine how this could have worked
| differently since the introduction of ASCII configurations a few months
| back...

ok, looking at the code, the last time I used kst for this stuff must have 
been before the current ASCII datasource was in use.  I guess the previous 
method of reading ascii files was dumb enough to trust the user and "do the 
right thing".

As Barth mentioned, we now need the data source to be smart enough to 
determine the number of fields ahead of the user requesting the starting 
line.  I'm not sure what the right method is.

Perhaps we could have the datasource keep it's current behaviour (return the 
fields at line 0), but with these changes:

1.  add another parameter to the fieldListFor function that specifies the 
starting line.

2.  if readField is called with a field that is not in the fieldList, do an 
additional call to fieldListFor with the current start frame to detect any 
changes in the field list.  If the field is still not valid, the return an 
error like normal.

So this has the following benefits:

1.  small code changes.

2.  The available fields are only re-scanned if the requested field is 
invalid.  In situations with a constant number of columns, this never 
happens.

What do you all think?  Should I make these changes?  Are there any 
consequences I've missed?

-Ted
Comment 9 Nicolas Brisset 2006-01-30 09:21:59 UTC
As an answer to the question in the first part of comment #5 above, the ASCII datasource has BOTH general/default settings (changed via the kst Configure->Data sources dialog), AND per-file settings (changed when you click the "Configure..." button in the first page of the datawizard. The latter override defaults. At first I was confused about this, but I've grown to like this feature and I think the current behavior is pretty powerful and convenient at the same time.
Could you check whether settign the defautls via the kst settings menu works for your case (i.e. allows you to read your 1000 files without further changes) ? If so, I'd recomment documenting the current behavior better and leaving the code as is.
Comment 10 Ted Kisner 2006-01-30 23:04:14 UTC
See below for my proposed patch...

On Monday 30 January 2006 01:09, Brisset, Nicolas wrote:
| I understand the concern that the command line should work, but as I
| wrote a few minutes ago in the bugzilla report, I pretty much like the
| current behavior and I wouldn't want it to be changed too much, as it is
| also critical for kst to be able to accomodate various file formats
| easily (which is the reason I asked for this feature in the very
| beginning quite a while ago, and the need is still here !). 

None of my changes should affect your useage.  The only non-commandline impact 
is that the initial fieldlist in the create new vector dialog is generated 
with the scanning technique.  Apparently you only use text files with a fixed 
number of columns, so you should have no problems.

| What's more, 
| I think it currently works if you set the default settings to the right
| values, and not file-specific values...

Ok, but even if there is a global default, what if I go view a different 
format ascii file and then come back?  I would have to open kst and change 
some global setting every time.  I'm sorry, but having to use the GUI for 
configuration really defeats the useage of kst as a fast tool to view data 
from the commandline- which is what we use it for 90% of the time.

| > Perhaps lines 1, 5, 10, and 50, rather than the 1st 50... but
| > otherwise, I agree.  And it is a significant regression.
| That could be nice for some "auto-detect" mode, but I doubt we can
| always rely on this. Could be the default kst ships with, though...

I think this will be fine for nearly all cases.  Certainly it is better than 
the current case where kst just refuses to display the data.

Here is what this patch does:

1.  In AsciiSource::fieldListFor, if skip (=_dataLines) is non-zero, then the 
code behaves as it does currently.

2.  If skip == 0, then we scan a sampling of (non-comment) lines at the 
beginning of the file.  Currently I scan lines 0, 1, 3, 7, 15, 31, 63, 127 
and take the maximum number of columns found to be the number for the whole 
file.

So if you start kst and go to the data manager and create a new vector from an 
ascii file, then initial fieldlist is populated by this "scanning" procedure.  
If you go to the "configure..." screen and specify a non-zero starting frame, 
then the number of columns is determined from this line alone (like current 
behaviour).

Is this ok to commit?  I also added a FIXME for the future when we might 
consider changing the datasource API.

-Ted


--- kst/datasources/ascii/ascii.cpp     (revision 503725)
+++ kst/datasources/ascii/ascii.cpp     (working copy)
@@ -32,7 +32,7 @@
 #include <qstylesheet.h>

 #include <kcombobox.h>
-#include <kdebug.h>
+#include <ksdebug.h>

 #include <kstmath.h>
 #include "ascii.h"
@@ -605,7 +605,6 @@
   return _numFrames < 1;
 }

-
 QStringList AsciiSource::fieldListFor(const QString& filename, 
AsciiSource::Config *cfg) {
   QStringList rc;
   QFile file(filename);
@@ -651,16 +650,59 @@
   bool done = false;
   QString line;
   int skip = cfg->_dataLine;
-  while (!file.atEnd() && !done) {
+  //FIXME This is a hack which should eventually be fixed by specifying
+  // the starting frame of the data when calling 
KstDataSource::fieldListForSource
+  // and KstDataSource::fieldList.  If the skip value is not specified, then
+  // we scan a few lines and take the maximum number of fields that we find.
+  int maxcnt;
+  if (skip > 0) {
+    maxcnt = -1;
+  } else {
+    maxcnt = 0;
+  }
+  int cnt;
+  int nextscan = 0;
+  int curscan = 0;
+  while (!file.atEnd() && !done && (nextscan < 200)) {
     int r = readFullLine(file, line);
-    if (skip > 0) {
+    if (skip > 0) { //keep skipping until desired line
       --skip;
       if (r < 0) {
         return rc;
       }
       continue;
     }
-    if (r > 1 && !re.exactMatch(line)) {
+    if (maxcnt >= 0) { //original skip value == 0, so scan some lines
+      if (curscan >= nextscan) {
+        if (r > 1 && !re.exactMatch(line)) {
+          line = line.stripWhiteSpace();
+          if (cfg->_columnType == AsciiSource::Config::Custom 
&& !cfg->_columnDelimiter.isEmpty()) {
+            cnt = 
QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), 
line, false).count();
+          } else if (cfg->_columnType == AsciiSource::Config::Fixed) {
+            cnt = line.length() / cfg->_columnWidth;
+          } else {
+            cnt = QStringList::split(QRegExp("\\s"), line, false).count();
+          }
+          if (cnt > maxcnt) {
+            maxcnt = cnt;
+          }
+        } else if (r < 0) {
+          return rc;
+        }
+        nextscan += nextscan + 1;
+      }
+      curscan++;
+      continue;
+    }
+    if (r > 1 && !re.exactMatch(line)) { //at desired line, find count
+      line = line.stripWhiteSpace();
+      if (cfg->_columnType == AsciiSource::Config::Custom 
&& !cfg->_columnDelimiter.isEmpty()) {
+        maxcnt = 
QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), 
line, false).count();
+      } else if (cfg->_columnType == AsciiSource::Config::Fixed) {
+        maxcnt = line.length() / cfg->_columnWidth;
+      } else {
+        maxcnt = QStringList::split(QRegExp("\\s"), line, false).count();
+      }
       done = true;
     } else if (r < 0) {
       return rc;
@@ -668,17 +710,7 @@
   }

   file.close();
-
-  int cnt;
-  line = line.stripWhiteSpace();
-  if (cfg->_columnType == AsciiSource::Config::Custom 
&& !cfg->_columnDelimiter.isEmpty()) {
-    cnt = 
QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), 
line, false).count();
-  } else if (cfg->_columnType == AsciiSource::Config::Fixed) {
-    cnt = line.length() / cfg->_columnWidth;
-  } else {
-    cnt = QStringList::split(QRegExp("\\s"), line, false).count();
-  }
-  for (int i = 1; i <= cnt; ++i) {
+  for (int i = 1; i <= maxcnt; ++i) {
     rc += QString::number(i);
   }
Comment 11 Netterfield 2006-02-15 02:06:07 UTC
Please commit your proposed patch.
Comment 12 Ted Kisner 2006-02-15 02:21:36 UTC
This patch made it into 1.2.0.  Forgot to close bug report.
Comment 13 George Staikos 2006-02-15 15:17:44 UTC
REMIND isn't really for this
Comment 14 George Staikos 2006-02-15 15:18:25 UTC
*** Bug has been marked as fixed ***.