Bug 110734 - Ability to mask datapoints for fits and other operations etc.
Summary: Ability to mask datapoints for fits and other operations etc.
Status: CONFIRMED
Alias: None
Product: kst
Classification: Applications
Component: general (show other bugs)
Version: 1.10.0
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: kst
URL:
Keywords:
: 112549 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-08-14 00:13 UTC by Matthew Truch
Modified: 2010-08-14 14:09 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Truch 2005-08-14 00:13:59 UTC
Version:           1.2.0_devel (using KDE KDE 3.4.0)
OS:                Linux

Sometimes in ones' data there are points that are obviously bad or otherwise unwanted.  It would be very useful to have the ability to mask out data points of the users choice, which can be ignored by things like fits or autoscaling or whatever.  

This is slightly related to bug 86915, although significantly increased in scope.  

Rumor has it that one kst user even reverted to installing WINE so that they could install Origin to get this functionality on their computer.  Whoa!
Comment 1 Netterfield 2005-08-15 17:57:52 UTC
On August 13, 2005 06:13 pm, Matthew Truch wrote:
> Rumor has it that one kst user even reverted to installing WINE so that
> they could install Origin to get this functionality on their computer. 


Shocking - a scandal....
Comment 2 Andrew Walker 2005-08-15 19:02:53 UTC
The bug report is somewhat vague but there are at least two scenarios for this:

1) user interactively selects points for omissions, using some key/mouse combination
2) points can be automatically excluded if they meet some criterion (such as devaition from the mean)

Should we provide visual feedback that points have been omitted? Should such points even be drawn?
Comment 3 Netterfield 2005-08-15 19:13:08 UTC
IIRC, the way this is done in Origin is a spreadsheet mode.  The unwanted 
points are indicated somehow, but are still plotted - perhaps in a different 
colour.

But we need a way to handle 10^lots data points in a spreadsheet.  Rumour is, 
QT4 will support such things.

Then we need to figure out how to handle it internally.  It may be the same 
mechanism as will be needed to support 'fit to visible points only'...

This should not be done in a hurry - we need to think it through better.
Comment 4 Nicolas Brisset 2005-08-23 10:15:09 UTC
I like this idea... and Andrew's suggestions. They don't seem to be too difficult to implement or use, the biggest issue here probably being performance. 

I don't think it is necessary to be able to edit data points in a spreadsheet interface for that feature (Barth, can you give a use case where suggestions 1) e.g. with Del+LMB or 2) above would not be handy enough ?). Just keeping an optional array of invalid points for each vector sould be enough. There should not be too many of them in normal cases... Invalid points should be drawn with a different color or point style, and possibly excluded when drawing lines.

Thinking a bit more about this, I do see potential difficulties from a user interaction perspective wrt Andrew's suggestion number 1: as curves are made up of 2 vectors (or more in some cases), how can the user specify which point is invalid ? Can we assume that the Y value is wrong or should we bring up a dialog for the user to indicate whether he wants to invalidate the X or Y value (would be my prefered solution) ? And how about curves involving vectors with different lengths where some points are actually interpolated: invalidate closest point or disable the feature ? 

I believe this feature is important and I hope it will be implemented in spite of these difficulties :-) 
As for the "fit to visible points only" feature, it should be possible to exclude whole ranges and not just isolated points. We just need to find the best solution to store isolated points as well as ranges in the "invalid points" container(s). Invalid data could then be excluded from all computations (fits, filters, etc...), which would require kst to pass "arranged" vectors to plugins. 
Comment 5 George Staikos 2005-09-13 22:19:16 UTC
On Monday 15 August 2005 13:12, Barth Netterfield wrote:
> IIRC, the way this is done in Origin is a spreadsheet mode.  The unwanted
> points are indicated somehow, but are still plotted - perhaps in a
> different colour.
>
> But we need a way to handle 10^lots data points in a spreadsheet.  Rumour
> is, QT4 will support such things.
>
> Then we need to figure out how to handle it internally.  It may be the same
> mechanism as will be needed to support 'fit to visible points only'...
>
> This should not be done in a hurry - we need to think it through better.


  Yes, Qt4 can do this efficiently.  Before that.... it will require 
reimplementing InterView.
Comment 6 Nicolas Brisset 2005-09-14 09:16:55 UTC
That sounds like a lot of work. How about the idea I exposed in bug #112549 ? Wouldn't that be enough to wait until Qt4/KDE4 ? 
It is true that pruned points would no longer be visible, whereas it may be better to somehow keep them and just plot them differently. But maybe we could (optionally ?) move the pruned points to new vectors instead of just deleting them to still be able to plot them if required (and if they are in specific vectors, then plotting them with different symbols/colors would be straightforward).
Comment 7 Netterfield 2005-09-14 12:02:05 UTC
*** Bug 112549 has been marked as a duplicate of this bug. ***
Comment 8 Netterfield 2005-09-14 12:52:12 UTC
Here is my proposed solution:

Vectors gain flag fields with something like the following methods:
  bool hasFlag()
  double flag(i): Non-zero means ignore.
  double interpFlag(int i, int ns): flag for use with interpolate(i,ns)
  doubke *flagV(): return pointer to raw flag vector; NULL if !hasFlag().
  clearFlag(): deletes the _flag and clears hasFlag().
  void setFlagV(KstVector V);
  void setFlagV(int *F);
  void setFlag(int i, int flag);

The calculation of all automatic statistics from a vector are only calculated for the unflagged data.  If flag is cleared (!hasFlag()) then there should be no speed impact and behavior is as it is now.

Kst2DPlot gains a new mouse mode for flagging data.  It will work the same as zoom modes, except the selected data is flagged, rather than zoomed. It should be possible to decide if you are flagging X data points or Y data points or both.  Default will be to flag Y only. There will be an RMB entry for clearFlags, and for "select only visible".  These will also exist in the zoom mode rmb menu.

Kst2DPlot gains the ability (with configuration in the plot dialog) to indicate flagged data (as different colour, different line type, or not drawn at all).  The X vector and Y vector flags will be ORed.

Vector Dialogs gain a combo for associating another vector with the flag.  This allows flagging with an equation, or the output of a plugin.

The double[] arrays that KstPlugins send to plugins only contain un-flagged data.  A checkbox will be provided in the dialog to select whether flags from equal length vectors should be ored before applying.  

Eventually a spreadsheet mode will be provided which will allow direct access to the flag vectors for hand editing.
Comment 9 Nicolas Brisset 2005-09-14 17:16:24 UTC
Hum, sounds very good indeed :-) It looks like this will provide the same functionality as I was describing, with a better user interface. I can't wait for that now :-)

A few remarks:
1) I suppose the "select only visible" menu item would flag non-visible points (to allow for instance restricting vectors to visible points only easily), otherwise an "invert flag" entry could be handy
2) would it be worth it trying to come up with a flagging scheme that does not require a flag vector of the same length as the data vector (maybe that's already the case, but from what I understand they will be of the same length) ? In many cases, flagged points will probably be few as compared to valid data points, or whole ranges (e.g. 1000 points from 10 to 1009) could be excluded with just 2 integers (10;1009) instead of 1000 !
3) it would be nice to add the options for displaying flagged points to a sub-menu of the RMB context menu as well as general settings
Comment 10 Peter Kümmel 2010-08-14 14:09:58 UTC
Could be still open in Kst 1.