Bug 130195 - Datasources should enforce field uniquity
Summary: Datasources should enforce field uniquity
Status: RESOLVED FIXED
Alias: None
Product: kst
Classification: Applications
Component: datasources (show other bugs)
Version: 1.x
Platform: unspecified Solaris
: NOR normal
Target Milestone: ---
Assignee: kst
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-03 14:51 UTC by Nicolas Brisset
Modified: 2007-05-09 19:30 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Testcase for bug #130195 (261 bytes, text/plain)
2006-07-03 14:53 UTC, Nicolas Brisset
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nicolas Brisset 2006-07-03 14:51:31 UTC
Version:           1.3.0_devel (using KDE 3.4.0, compiled sources)
Compiler:          gcc version 3.4.3
OS:                SunOS (sun4u) release 5.8

There currently is a flaw/excessive flexibility in some datasources (like ASCII for instance) where the provided field list can contain multiple entries with the same name. 
The problem is that the datawizard allows selecting any of the entries while in fact only the first one can be accessed.
That should be fixed as there will be (in fact, there have been!) cases of users looking at some curves which don't come from the right field.

To illustrate the problem, I'll attach a small data file shortly. Select the second "VAR" field in the datawizard, and notice that it is the first one that actually gets plotted (of course, you have to configure the datasource so that fields are accessed by their names, otherwise the names are unique indeed).

The fix should be fairly simple: the datasource should check unicity before adding the fields when building the list, and remember the mapping for transformed names so that the data can be accessed later. The only concern here may be to provide a very fast lookup for the cases a user may have many variables (a few thousand is not uncommon around here!). I don't know what the best solution is, but I suspect Qt provides useful classes for that... I have no idea what the fasted solution is, though.

In most cases where an external lib is used to access the file (like CDF or netCDF) I believe this bug does not apply as the library will probably complain during file creation if a user tries to create two identical field names (at least, I hope so even though I still need to check!). But I think it should be fixed for ASCII, and possibly other formats which may exhibit the same problem.
Comment 1 Nicolas Brisset 2006-07-03 14:53:16 UTC
Created attachment 16873 [details]
Testcase for bug #130195

Data file to illustrate the problem. You can load all three "VAR" vectors in
the datawizard and look at the result. Not quite what one would expect :-(
Comment 2 George Staikos 2007-03-04 22:34:43 UTC
This bug is per-datasource plugin and really nothing to do with Kst proper.  I'm tempted to close and let individual bugs be filed against each source that creates inaccessible fields.
Comment 3 Nicolas Brisset 2007-03-05 13:03:00 UTC
It's true that this is per-datasource. The only thing I see that kst could do is check for duplicate entries the first time it gets the list, and warn the user that some fields will not be accessible, with a hint of which datasource is being used and how to create the report...
Comment 4 Andrew Walker 2007-05-07 22:42:54 UTC
SVN commit 662309 by arwalker:

BUG:130195 warn the user if there are duplicate field names in their data source

 M  +18 -0     kstdatasource.cpp  


--- branches/work/kst/1.5/kst/src/libkst/kstdatasource.cpp #662308:662309
@@ -334,6 +334,24 @@
     QString typeSuggestion;
     rc = (*i).plugin->fieldList(kConfigObject, fn, QString::null, &typeSuggestion, complete);
     if (!rc.isEmpty()) {
+      //
+      // check for duplicate field names and warn the user if necessary...
+      //
+      QStringList::const_iterator it = rc.begin();
+      QString str;
+
+      for (; it != rc.end(); ) {
+        str = (*it);
+        ++it;
+        if (it != rc.end()) {
+          if (rc.find(it, str) != rc.end()) {
+            KstDebug::self()->log(i18n( "The datasource has at least one duplicate field name; '%1'. As a result one or more fields will not be accessible." ).arg(str), KstDebug::Error);
+
+            break;
+          }
+        }
+      }
+
       if (outType) {
         if (typeSuggestion.isEmpty()) {
           *outType = (*i).plugin->provides()[0];
Comment 5 Nicolas Brisset 2007-05-09 10:03:29 UTC
Right, this never really got fixed... I think it's sane to do something about it, I'm just a bit worried about the performance penalty here. 
All these calls to rc.find() might turn out to become a problem when there are many variables in the file. I have colleagues who routinely use data files with 10 000 variables ! I haven't had a chance to test this yet, but isn't there a quicker way ? QStringList sorting being O(n*log n), maybe sorting a copy of the list, then looking only at consecutive entries in the sorted list would be faster overall ? Note that I don't know how fast/slow find() is, though...
Comment 6 Andrew Walker 2007-05-09 19:30:34 UTC
SVN commit 662958 by arwalker:

CCBUG:130195 Make search for duplicate field names more efficient

 M  +19 -21    kstdatasource.cpp  


--- branches/work/kst/1.5/kst/src/libkst/kstdatasource.cpp #662957:662958
@@ -330,31 +330,13 @@
 
   QValueList<PluginSortContainer> bestPlugins = bestPluginsForSource(fn, type);
   QStringList rc;
-  for (QValueList<PluginSortContainer>::Iterator i = bestPlugins.begin(); i != bestPlugins.end(); ++i) {
+  for (QValueList<PluginSortContainer>::Iterator it = bestPlugins.begin(); it != bestPlugins.end(); ++it) {
     QString typeSuggestion;
-    rc = (*i).plugin->fieldList(kConfigObject, fn, QString::null, &typeSuggestion, complete);
+    rc = (*it).plugin->fieldList(kConfigObject, fn, QString::null, &typeSuggestion, complete);
     if (!rc.isEmpty()) {
-      //
-      // check for duplicate field names and warn the user if necessary...
-      //
-      QStringList::const_iterator it = rc.begin();
-      QString str;
-
-      for (; it != rc.end(); ) {
-        str = (*it);
-        ++it;
-        if (it != rc.end()) {
-          if (rc.find(it, str) != rc.end()) {
-            KstDebug::self()->log(i18n( "The datasource has at least one duplicate field name; '%1'. As a result one or more fields will not be accessible." ).arg(str), KstDebug::Error);
-
-            break;
-          }
-        }
-      }
-
       if (outType) {
         if (typeSuggestion.isEmpty()) {
-          *outType = (*i).plugin->provides()[0];
+          *outType = (*it).plugin->provides()[0];
         } else {
           *outType = typeSuggestion;
         }
@@ -363,6 +345,22 @@
     }
   }
 
+  if (!rc.isEmpty()) {
+    //
+    // check for duplicate field names and warn the user if necessary...
+    //
+    QMap<QString, QString> map;
+
+    for (QStringList::const_iterator it = rc.begin(); it != rc.end(); ++it) {
+      map.insert(*it, *it);
+    }
+
+    if (map.size() != rc.size()) {
+      KstDebug::self()->log( i18n("The datasource '%1' has at least one duplicate field name. As a result one or more fields will not be accessible.").arg(filename), KstDebug::Error);
+    }
+  }
+
+
   return rc;
 }