Version: 1.2.0_devel (using KDE 3.4.0, compiled sources) Compiler: gcc version 3.4.3 OS: SunOS (sun4u) release 5.8 When loading an ASCII file with field names, if the corresponding line exceeds 1000 characters, the field list is truncated. I suppose this is linked with the fact that the QString "line" containing field names is read with file.readline(line, 1000) in AsciiSource::fieldListFor(...). In the version I wroteinitially, I read the QString "line" with a QTextStream attached to QFile "file", as the readline() method in that class reads the complete line, but there may be another/better way.
SVN commit 437726 by arwalker: BUG:109456 Read in a complete line even if it extends beyond 1000 characters. M +49 -27 ascii.cpp M +3 -1 ascii.h --- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.cpp #437725:437726 @@ -15,14 +15,12 @@ * * ***************************************************************************/ -#include "ascii.h" -#include "asciiconfig.h" -#include <kstmath.h> +#include <assert.h> +#include <ctype.h> +#include <math.h> +#include <stdlib.h> -#include <kcombobox.h> -#include <kdebug.h> - #include <qcheckbox.h> #include <qfile.h> #include <qfileinfo.h> @@ -33,11 +31,13 @@ #include <qspinbox.h> #include <qstylesheet.h> -#include <assert.h> -#include <ctype.h> -#include <math.h> -#include <stdlib.h> +#include <kcombobox.h> +#include <kdebug.h> +#include <kstmath.h> +#include "ascii.h" +#include "asciiconfig.h" + #ifndef INF double INF = 1.0/0.0; #endif @@ -217,6 +217,31 @@ } +int AsciiSource::readFullLine(QFile &file, QString &str) { + int read; + + read = file.readLine(str, 1000); + + if (read == 1000-1) { + QString strExtra; + int readExtra; + + while (str[read-1] != '\n') { + readExtra = file.readLine(strExtra, 1000); + if (readExtra > 0) { + read += readExtra; + str += strExtra; + } else { + read = readExtra; + break; + } + } + } + + return read; +} + + bool AsciiSource::initRowIndex() { if (!_rowIndex) { _rowIndex = (int *)malloc(32768 * sizeof(int)); @@ -234,11 +259,8 @@ int left = _config->_dataLine; int didRead = 0; QString ignore; -#if QT_VERSION >= 0x030200 - ignore.reserve(1001); -#endif while (left > 0) { - int thisRead = file.readLine(ignore, 1000); + int thisRead = readFullLine(file, ignore); if (thisRead <= 0 || file.atEnd()) { return false; } @@ -602,7 +624,7 @@ int l = cfg->_fieldsLine; QString line; while (!file.atEnd()) { - int r = file.readLine(line, 1000); + int r = readFullLine(file, line); if (l-- == 0) { if (r >= 0) { if (cfg->_columnType == AsciiSource::Config::Custom && !cfg->_columnDelimiter.isEmpty()) { @@ -634,19 +656,19 @@ QString line; int skip = cfg->_dataLine; while (!file.atEnd() && !done) { - int r = file.readLine(line, 1000); - if (skip > 0) { - --skip; - if (r < 0) { - return rc; - } - continue; - } - if (r > 1 && !re.exactMatch(line)) { - done = true; - } else if (r < 0) { + int r = readFullLine(file, line); + if (skip > 0) { + --skip; + if (r < 0) { return rc; } + continue; + } + if (r > 1 && !re.exactMatch(line)) { + done = true; + } else if (r < 0) { + return rc; + } } file.close(); @@ -897,7 +919,7 @@ int skip = config._dataLine; while (!done) { - rc = f.readLine(s, 1000); + rc = AsciiSource::readFullLine(f, s); if (skip > 0) { --skip; if (rc <= 0) { --- trunk/extragear/graphics/kst/kst/datasources/ascii/ascii.h #437725:437726 @@ -18,15 +18,17 @@ #ifndef ASCII_H #define ASCII_H +#include <qfile.h> + #include <kstdatasource.h> - class AsciiSource : public KstDataSource { public: AsciiSource(KConfig *cfg, const QString& filename, const QString& type, const QDomElement& e = QDomElement()); ~AsciiSource(); + static int readFullLine(QFile &file, QString &str); bool initRowIndex(); KstObject::UpdateType update(int = -1);
I'm not sure I like the idea of reading a non-fixed amount, for a few reasons. 1) Pointing to an invalid file, even temporarily (think auto-completion) could crash Kst (or the entire system) or use a huge amount of memory and cpu. 2) It's considerably less efficient in a very performance-critical piece of code. I can definitely buy the argument that 1000 bytes is too little for a line though. Why don't we bump it up to 10000 instead? Are there real use cases for > 10000 that justify the performance hit and stability issues?
I hadn't thought of the problem you mention. However, I have already seen cases of files with very many variables, over 20000 bytes (and I was happy to say that kst could handle them with the ASCII reader I had implemented initially). I agree that this may be a bit extreme, but I think reading even 50000 bytes would not crash a system nor hurt performance too much. So I'd suggest making it 50000, and if there are still problems, adding an option in the ASCII config dialog.
If the file has lines with 50,000 characters, it's reasonable to expect that the data file is 2.5GB or larger. Is that really reasonable for ASCII files? You could argue that only one line has 50,000 characters, but I would have to say that this is quite silly and perhaps it's time to cut down the comments or shorten the column names to 16 characters or less.
I don't understand why a file with 2500 vars over 16 columns will necessarily reach 2.5 GB ? The header line with var names as well as each data line ("frame") would be 2500x16=40000 bytes, and there could be as few as 10 or 100 frames...