109456 – ASCII datasource truncates field list

Bug 109456 - ASCII datasource truncates field list

Summary: ASCII datasource truncates field list

Status:	RESOLVED FIXED

Alias:	None

Product:	kst
Classification:	Applications
Component:	general (other bugs)
Version First Reported In:	1.x
Platform:	unspecified Solaris

Importance:	NOR normal
Target Milestone:	---
Assignee:	kst

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-07-22 11:17 UTC by Nicolas Brisset
Modified:	2005-08-23 11:02 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Nicolas Brisset 2005-07-22 11:17:20 UTC

Version:           1.2.0_devel (using KDE 3.4.0, compiled sources)
Compiler:          gcc version 3.4.3
OS:                SunOS (sun4u) release 5.8

When loading an ASCII file with field names, if the corresponding line exceeds 1000 characters, the field list is truncated.

I suppose this is linked with the fact that the QString "line" containing field names is read with file.readline(line, 1000) in AsciiSource::fieldListFor(...). 
In the version I wroteinitially, I read the QString "line" with a QTextStream attached to QFile "file", as the readline() method in that class reads the complete line, but there may be another/better way.

Comment 1 Andrew Walker 2005-07-22 23:59:32 UTC

SVN commit 437726 by arwalker:

BUG:109456 Read in a complete line even if it extends beyond 1000 characters.

 M  +49 -27    ascii.cpp  
 M  +3 -1      ascii.h  


--- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.cpp #437725:437726
@@ -15,14 +15,12 @@
  *                                                                         *
  ***************************************************************************/
 
-#include "ascii.h"
-#include "asciiconfig.h"
 
-#include <kstmath.h>
+#include <assert.h>
+#include <ctype.h>
+#include <math.h>
+#include <stdlib.h>
 
-#include <kcombobox.h>
-#include <kdebug.h>
-
 #include <qcheckbox.h>
 #include <qfile.h>
 #include <qfileinfo.h>
@@ -33,11 +31,13 @@
 #include <qspinbox.h>
 #include <qstylesheet.h>
 
-#include <assert.h>
-#include <ctype.h>
-#include <math.h>
-#include <stdlib.h>
+#include <kcombobox.h>
+#include <kdebug.h>
 
+#include <kstmath.h>
+#include "ascii.h"
+#include "asciiconfig.h"
+
 #ifndef INF
 double INF = 1.0/0.0;
 #endif
@@ -217,6 +217,31 @@
 }
 
 
+int AsciiSource::readFullLine(QFile &file, QString &str) {
+  int     read;
+
+  read = file.readLine(str, 1000);
+
+  if (read == 1000-1) {
+    QString strExtra;
+    int     readExtra;
+    
+    while (str[read-1] != '\n') {
+      readExtra = file.readLine(strExtra, 1000);
+      if (readExtra > 0) {
+        read += readExtra;
+        str  += strExtra;
+      } else {
+        read  = readExtra;
+        break;
+      }
+    }
+  }
+
+  return read;
+}
+ 
+
 bool AsciiSource::initRowIndex() {
   if (!_rowIndex) {
     _rowIndex = (int *)malloc(32768 * sizeof(int));
@@ -234,11 +259,8 @@
     int left = _config->_dataLine;
     int didRead = 0;
     QString ignore;
-#if QT_VERSION >= 0x030200
-    ignore.reserve(1001);
-#endif
     while (left > 0) {
-      int thisRead = file.readLine(ignore, 1000);
+      int thisRead = readFullLine(file, ignore);
       if (thisRead <= 0 || file.atEnd()) {
         return false;
       }
@@ -602,7 +624,7 @@
     int l = cfg->_fieldsLine;
     QString line;
     while (!file.atEnd()) {
-      int r = file.readLine(line, 1000);
+      int r = readFullLine(file, line);
       if (l-- == 0) {
         if (r >= 0) {
           if (cfg->_columnType == AsciiSource::Config::Custom && !cfg->_columnDelimiter.isEmpty()) {
@@ -634,19 +656,19 @@
   QString line;
   int skip = cfg->_dataLine;
   while (!file.atEnd() && !done) {
-      int r = file.readLine(line, 1000);
-      if (skip > 0) {
-        --skip;
-        if (r < 0) {
-          return rc;
-        }
-        continue;
-      }
-      if (r > 1 && !re.exactMatch(line)) {
-        done = true;
-      } else if (r < 0) {
+    int r = readFullLine(file, line);
+    if (skip > 0) {
+      --skip;
+      if (r < 0) {
         return rc;
       }
+      continue;
+    }
+    if (r > 1 && !re.exactMatch(line)) {
+      done = true;
+    } else if (r < 0) {
+      return rc;
+    }
   }
 
   file.close();
@@ -897,7 +919,7 @@
     int skip = config._dataLine;
 
     while (!done) {
-      rc = f.readLine(s, 1000);
+      rc = AsciiSource::readFullLine(f, s);
       if (skip > 0) {
         --skip;
         if (rc <= 0) {
--- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.h #437725:437726
@@ -18,15 +18,17 @@
 #ifndef ASCII_H
 #define ASCII_H
 
+#include <qfile.h>
+
 #include <kstdatasource.h>
 
-
 class AsciiSource : public KstDataSource {
   public:
     AsciiSource(KConfig *cfg, const QString& filename, const QString& type, const QDomElement& e = QDomElement());
 
     ~AsciiSource();
 
+    static int readFullLine(QFile &file, QString &str);
     bool initRowIndex();
 
     KstObject::UpdateType update(int = -1);

Comment 2 George Staikos 2005-07-25 19:47:55 UTC

  I'm not sure I like the idea of reading a non-fixed amount, for a few 
reasons.

1) Pointing to an invalid file, even temporarily (think auto-completion) could 
crash Kst (or the entire system) or use a huge amount of memory and cpu.
2) It's considerably less efficient in a very performance-critical piece of 
code.

  I can definitely buy the argument that 1000 bytes is too little for a line 
though.  Why don't we bump it up to 10000 instead?  Are there real use cases 
for > 10000 that justify the performance hit and stability issues?

Comment 3 Nicolas Brisset 2005-07-26 00:11:13 UTC

I hadn't thought of the problem you mention. However, I have already seen cases of files with very many variables, over 20000 bytes (and I was happy to say that kst could handle them with the ASCII reader I had implemented initially). I agree that this may be a bit extreme, but I think reading even 50000 bytes would not crash a system nor hurt performance too much. So I'd suggest making it 50000, and if there are still problems, adding an option in the ASCII config dialog.

Comment 4 George Staikos 2005-07-26 03:17:18 UTC

  If the file has lines with 50,000 characters, it's reasonable to expect that 
the data file is 2.5GB or larger.  Is that really reasonable for ASCII files?

  You could argue that only one line has 50,000 characters, but I would have 
to say that this is quite silly and perhaps it's time to cut down the 
comments or shorten the column names to 16 characters or less.

Comment 5 Nicolas Brisset 2005-08-23 11:02:35 UTC

I don't understand why a file with 2500 vars over 16 columns will necessarily reach 2.5 GB ? The header line with var names as well as each data line ("frame") would be 2500x16=40000 bytes, and there could be as few as 10 or 100 frames...