Summary: | Command line option -f fails for some ascii files | ||
---|---|---|---|
Product: | [Applications] kst | Reporter: | Ted Kisner <tskisner.public> |
Component: | general | Assignee: | kst |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | HI | ||
Version: | 1.x | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Ted Kisner
2006-01-26 18:54:53 UTC
On Thursday 26 January 2006 12:54, Ted Kisner wrote:
> I'll try to dig in to this over the weekend, but maybe someone knows
> exactly where the problem is, due to recent changes in main.cpp, the
> datasource, etc? _______________________________________________
I think it has to be in the datasource.
I think if you open up kst, go to the settings menu, datasources tab and change the ASCII settings there to tell it to skip 1 line, save and exit, it should work from the commandline as well... Alternatively, you can modify the appropriate line in kstdatasourcerc with a script if you really need to automate this. On Friday 27 January 2006 00:08, Nicolas Brisset wrote | ------- Additional Comments From nicolas.brisset eurocopter com 2006-01-27 | 09:08 ------- I think if you open up kst, go to the settings menu, | datasources tab and change the ASCII settings there to tell it to skip 1 | line, save and exit, it should work from the commandline as well... | Alternatively, you can modify the appropriate line in kstdatasourcerc with | a script if you really need to automate this. But the point is that it *used* to work, and this is an important feature from my perspective (and at least several of my colleagues). I don't want to have to open kst to change a setting everytime I want to make a plot of a new file from the command line. That kind of defeats the purpose of using the command line. Do you have any ideas where this feature got broken in the datasource? I'm busy today, but will look at it this weekend. Fixing this is a high priority for me, since I'm in process of generating *thousands* of such files, and it's really annoying to have to do "tail -<lines-1> file > temp" every time I want to plot something. -Ted On Friday 27 January 2006 09:06, Barth Netterfield wrote: | BUT: the data source has to report back what fields are valid, without | knowing what row you are starting to read from. On Friday 27 January 2006 09:11, Brisset, Nicolas wrote: | You only have to open kst if you need to change the settings, but as | long as all the files you are dealing with have the same format (which | should be the case at least 99.9% of the time), you can use the command | line without any problem. Unless I'm missing something, this is simply not true. kst saves the config *for a given file*. Let's say I generate a thousand files with the same format. Now I open one file in kst and go to the datasource and configure it how I want. kst remembers this setting *for this file*. If I open any of the other 999 files, this setting is not true. | I can't imagine how this could have worked | differently since the introduction of ASCII configurations a few months | back... I haven't used kst for this kind of ascii data in a while, so I'm not sure when the feature broke. -Ted On Friday 27 January 2006 11:17, Ted Kisner wrote: | What do you all think? On Friday 27 January 2006 11:17, Ted Kisner wrote: | What do you all think? On Friday 27 January 2006 09:11, Brisset, Nicolas wrote: | I can't imagine how this could have worked | differently since the introduction of ASCII configurations a few months | back... ok, looking at the code, the last time I used kst for this stuff must have been before the current ASCII datasource was in use. I guess the previous method of reading ascii files was dumb enough to trust the user and "do the right thing". As Barth mentioned, we now need the data source to be smart enough to determine the number of fields ahead of the user requesting the starting line. I'm not sure what the right method is. Perhaps we could have the datasource keep it's current behaviour (return the fields at line 0), but with these changes: 1. add another parameter to the fieldListFor function that specifies the starting line. 2. if readField is called with a field that is not in the fieldList, do an additional call to fieldListFor with the current start frame to detect any changes in the field list. If the field is still not valid, the return an error like normal. So this has the following benefits: 1. small code changes. 2. The available fields are only re-scanned if the requested field is invalid. In situations with a constant number of columns, this never happens. What do you all think? Should I make these changes? Are there any consequences I've missed? -Ted As an answer to the question in the first part of comment #5 above, the ASCII datasource has BOTH general/default settings (changed via the kst Configure->Data sources dialog), AND per-file settings (changed when you click the "Configure..." button in the first page of the datawizard. The latter override defaults. At first I was confused about this, but I've grown to like this feature and I think the current behavior is pretty powerful and convenient at the same time. Could you check whether settign the defautls via the kst settings menu works for your case (i.e. allows you to read your 1000 files without further changes) ? If so, I'd recomment documenting the current behavior better and leaving the code as is. See below for my proposed patch... On Monday 30 January 2006 01:09, Brisset, Nicolas wrote: | I understand the concern that the command line should work, but as I | wrote a few minutes ago in the bugzilla report, I pretty much like the | current behavior and I wouldn't want it to be changed too much, as it is | also critical for kst to be able to accomodate various file formats | easily (which is the reason I asked for this feature in the very | beginning quite a while ago, and the need is still here !). None of my changes should affect your useage. The only non-commandline impact is that the initial fieldlist in the create new vector dialog is generated with the scanning technique. Apparently you only use text files with a fixed number of columns, so you should have no problems. | What's more, | I think it currently works if you set the default settings to the right | values, and not file-specific values... Ok, but even if there is a global default, what if I go view a different format ascii file and then come back? I would have to open kst and change some global setting every time. I'm sorry, but having to use the GUI for configuration really defeats the useage of kst as a fast tool to view data from the commandline- which is what we use it for 90% of the time. | > Perhaps lines 1, 5, 10, and 50, rather than the 1st 50... but | > otherwise, I agree. And it is a significant regression. | That could be nice for some "auto-detect" mode, but I doubt we can | always rely on this. Could be the default kst ships with, though... I think this will be fine for nearly all cases. Certainly it is better than the current case where kst just refuses to display the data. Here is what this patch does: 1. In AsciiSource::fieldListFor, if skip (=_dataLines) is non-zero, then the code behaves as it does currently. 2. If skip == 0, then we scan a sampling of (non-comment) lines at the beginning of the file. Currently I scan lines 0, 1, 3, 7, 15, 31, 63, 127 and take the maximum number of columns found to be the number for the whole file. So if you start kst and go to the data manager and create a new vector from an ascii file, then initial fieldlist is populated by this "scanning" procedure. If you go to the "configure..." screen and specify a non-zero starting frame, then the number of columns is determined from this line alone (like current behaviour). Is this ok to commit? I also added a FIXME for the future when we might consider changing the datasource API. -Ted --- kst/datasources/ascii/ascii.cpp (revision 503725) +++ kst/datasources/ascii/ascii.cpp (working copy) @@ -32,7 +32,7 @@ #include <qstylesheet.h> #include <kcombobox.h> -#include <kdebug.h> +#include <ksdebug.h> #include <kstmath.h> #include "ascii.h" @@ -605,7 +605,6 @@ return _numFrames < 1; } - QStringList AsciiSource::fieldListFor(const QString& filename, AsciiSource::Config *cfg) { QStringList rc; QFile file(filename); @@ -651,16 +650,59 @@ bool done = false; QString line; int skip = cfg->_dataLine; - while (!file.atEnd() && !done) { + //FIXME This is a hack which should eventually be fixed by specifying + // the starting frame of the data when calling KstDataSource::fieldListForSource + // and KstDataSource::fieldList. If the skip value is not specified, then + // we scan a few lines and take the maximum number of fields that we find. + int maxcnt; + if (skip > 0) { + maxcnt = -1; + } else { + maxcnt = 0; + } + int cnt; + int nextscan = 0; + int curscan = 0; + while (!file.atEnd() && !done && (nextscan < 200)) { int r = readFullLine(file, line); - if (skip > 0) { + if (skip > 0) { //keep skipping until desired line --skip; if (r < 0) { return rc; } continue; } - if (r > 1 && !re.exactMatch(line)) { + if (maxcnt >= 0) { //original skip value == 0, so scan some lines + if (curscan >= nextscan) { + if (r > 1 && !re.exactMatch(line)) { + line = line.stripWhiteSpace(); + if (cfg->_columnType == AsciiSource::Config::Custom && !cfg->_columnDelimiter.isEmpty()) { + cnt = QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), line, false).count(); + } else if (cfg->_columnType == AsciiSource::Config::Fixed) { + cnt = line.length() / cfg->_columnWidth; + } else { + cnt = QStringList::split(QRegExp("\\s"), line, false).count(); + } + if (cnt > maxcnt) { + maxcnt = cnt; + } + } else if (r < 0) { + return rc; + } + nextscan += nextscan + 1; + } + curscan++; + continue; + } + if (r > 1 && !re.exactMatch(line)) { //at desired line, find count + line = line.stripWhiteSpace(); + if (cfg->_columnType == AsciiSource::Config::Custom && !cfg->_columnDelimiter.isEmpty()) { + maxcnt = QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), line, false).count(); + } else if (cfg->_columnType == AsciiSource::Config::Fixed) { + maxcnt = line.length() / cfg->_columnWidth; + } else { + maxcnt = QStringList::split(QRegExp("\\s"), line, false).count(); + } done = true; } else if (r < 0) { return rc; @@ -668,17 +710,7 @@ } file.close(); - - int cnt; - line = line.stripWhiteSpace(); - if (cfg->_columnType == AsciiSource::Config::Custom && !cfg->_columnDelimiter.isEmpty()) { - cnt = QStringList::split(QRegExp(QString("[%1]").arg(QRegExp::escape(cfg->_columnDelimiter))), line, false).count(); - } else if (cfg->_columnType == AsciiSource::Config::Fixed) { - cnt = line.length() / cfg->_columnWidth; - } else { - cnt = QStringList::split(QRegExp("\\s"), line, false).count(); - } - for (int i = 1; i <= cnt; ++i) { + for (int i = 1; i <= maxcnt; ++i) { rc += QString::number(i); } Please commit your proposed patch. This patch made it into 1.2.0. Forgot to close bug report. REMIND isn't really for this *** Bug has been marked as fixed ***. |