Version: 1.1.0_beta2 (using KDE KDE 3.4.0) Installed from: Compiled From Sources Compiler: gcc 3.3.3 OS: Linux When loading vectors from an ASCII files, NaN (Not a Number) values are skipped, which can result in pretty bad and hard to understand problems. For instance, imagine I have a file with X and Y vectors, with some data missing and replaced by some non-numeric text string in the data file (it sometimes happens in the real world!). If 10 values are missing for X and 20 for Y, when I plot Y vs X I may get erratic results because of the skipped values and automatic interpolation performed by kst. I think NaN values should be preserved in the vectors, and resulting points skipped when plotting. If the plot uses lines, when there are such points they should be "cut".
Just to clarify: NaN points within vectors will not be plotted and will be skipped when plotting. The only problem is with the ascii datasource.
Could you please attach a date file that shows the problem and details of what you're plotting against what.
Created attachment 11485 [details] Nice testcase (world map) When the attached file is loaded (world.txt), NaN values are "skipped" at some point (I don't think it happens at the datasource level, but rather in one of the calling functions). The expected result is a nice world map, without connection lines everywhere :-)
Created attachment 11486 [details] Nice testcase (world map) When the attached file is loaded (world.txt), NaN values are "skipped" at some point (I don't think it happens at the datasource level, but rather in one of the calling functions). The expected result is a nice world map, without connection lines everywhere :-)
This problem does arise at the data source level. Any line that does not contain at least one digit is simply discarded. Thus, the data source will read in both of the following: 100.0 NaN NaN 100.0 but will not read in: NaN NaN
This behaviour is defined in AsciiSource::update(int u). The question remaining is do we want to fix it? I would suggest that we do as if the data stream represents points sampled every second then a line comprised of all NaN values that is simply skipped will change the meaning of that data.
I've made changes to both kst2dplot.cpp and ascii.cpp to fix this problem. The changes to ascii.cpp should be reviewed as things may run slower than before. If I don't get any feedback in the next couple of days I'll mark this bug report as fixed. Changes to kst2dplot.cpp were necessary to correct problems with drawing when the x-value does not change across a NaN value.
No comments, so all is well.
Do you have a feeling as to how much slower things are? Do not close the bug until this has been answered. On Friday 17 June 2005 18:33, Andrew Walker wrote: > problem. The changes to ascii.cpp should be reviewed as things may run > slower than before. If I don't get any feedback in the next couple of days > I'll mark this bug report as fixed.
I can certainly do some timing on it to get an estimate, but ultimately there may not be much we can do if we want the data to be read in correctly. Andrew -----Original Message----- From: netterfield@astro.utoronto.ca [mailto:netterfield@astro.utoronto.ca] Sent: Tuesday, June 21, 2005 6:32 AM To: kst@kde.org Subject: [Kst] [Bug 106008] NaN values should not be skipped while importingdata from ASCII file ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. http://bugs.kde.org/show_bug.cgi?id=106008 ------- Additional Comments From netterfield astro utoronto ca 2005-06-21 15:31 ------- Do you have a feeling as to how much slower things are? Do not close the bug until this has been answered. On Friday 17 June 2005 18:33, Andrew Walker wrote: > problem. The changes to ascii.cpp should be reviewed as things may run > slower than before. If I don't get any feedback in the next couple of days > I'll mark this bug report as fixed. _______________________________________________ Kst mailing list Kst@kde.org https://mail.kde.org/mailman/listinfo/kst
SVN commit 427790 by arwalker: CCMAIL:106008@bugs.kde.org ascii datasource now runs faster than the original version, by about 40 percent. The only difference between the two is that a line such as 'X Y Z' would previously have been skipped, but is now treated (correctly I believe) as '0 0 0' M +12 -5 ascii.cpp --- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.cpp #427789:427790 @@ -280,10 +280,12 @@ } _valid = true; - + int bufstart, bufread; bool new_data = false; - char tmpbuf[MAXBUFREADLEN]; + char tmpbuf[MAXBUFREADLEN+1]; + char* comment; + do { /* Read the tmpbuffer, starting at row_index[_numFrames] */ if (_byteLength - _rowIndex[_numFrames] > MAXBUFREADLEN) { @@ -295,10 +297,12 @@ bufstart = _rowIndex[_numFrames]; file.at(bufstart); // expensive? file.readBlock(tmpbuf, bufread); - + tmpbuf[bufread] = '\0'; + bool is_comment = false, has_dat = false; + comment = strpbrk(tmpbuf, _config->_delimiters.latin1()); for (int i = 0; i < bufread; i++) { - if (_config->_delimiters.contains(tmpbuf[i])) { + if (comment == &(tmpbuf[i])) { is_comment = true; } else if (tmpbuf[i] == '\n' || tmpbuf[i] == '\r') { if (has_dat) { @@ -311,12 +315,15 @@ } _rowIndex[_numFrames] = bufstart + i + 1; has_dat = is_comment = false; + if (comment && comment < &(tmpbuf[i])) { + comment = strpbrk(&(tmpbuf[i]), _config->_delimiters.latin1()); + } } else if (!is_comment && !isspace(tmpbuf[i])) { has_dat = true; } } } while (bufread == MAXBUFREADLEN); - + file.close(); updateNumFramesScalar();
I believe it would be better for a line like 'X Y Z' to generate 'NaN' values in the vectors, so that curves using these vectors are not drawn at these places. If non-numeric values are turned into 0, how can you know that these values are missing ? It could very well happen that a 0 value is physically acceptable for a vector (i.e. you can't tell there was a missing value just because it's 0), while very different from NaN (no data available, e.g. broken sensor or whatever).
On June 22, 2005 02:59 am, Nicolas Brisset wrote: > 08:59 ------- I believe it would be better for a line like 'X Y Z' to > generate 'NaN' values in the vectors, so that curves using these vectors > are not drawn at these places. Only if this can be done with no speed impact in the data source. Otherwise, the current behaviour is acceptable.
SVN commit 428063 by arwalker: CCMAIL:106008@bugs.kde.org Interpret text strings as NaN instead of 0.0 with no appreciable increase in read time of data M +6 -0 ascii.cpp --- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.cpp #428062:428063 @@ -414,6 +414,9 @@ i_col++; if (i_col == col) { v[i] = atof(_tmpBuf + ch); + if (v[i] == 0.0 && !isdigit(_tmpBuf[ch])) { + v[i] = atof("NaN"); + } break; } } @@ -441,6 +444,9 @@ i_col++; if (i_col == col) { v[i] = atof(_tmpBuf + ch); + if (v[i] == 0.0 && !isdigit(_tmpBuf[ch])) { + v[i] = atof("NaN"); + } break; } }