Bug 243684 - Handling of formatted timestamps as a valid field in input files
Summary: Handling of formatted timestamps as a valid field in input files
Status: RESOLVED FIXED
Alias: None
Product: kst
Classification: Applications
Component: datasources (show other bugs)
Version: 2.0.4
Platform: RedHat Enterprise Linux Linux
: NOR wishlist
Target Milestone: 2.0.5
Assignee: kst
URL:
Keywords:
: 243445 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-05 20:23 UTC by stijn.ilsen
Modified: 2013-11-28 07:21 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
ASCII time/date config options with time submenu (98.49 KB, image/png)
2012-02-24 22:05 UTC, Nicolas Brisset
Details
ASCII time/date config options with date submenu (97.35 KB, image/png)
2012-02-24 22:05 UTC, Nicolas Brisset
Details
QtDesigner file used for the mockups - ready to be coded :-) (24.39 KB, application/xml)
2012-02-24 22:54 UTC, Nicolas Brisset
Details

Note You need to log in before you can comment on or make changes to this bug.
Description stijn.ilsen 2010-07-05 20:23:27 UTC
Version:           1.9.1 (using KDE 4.4.2) 
OS:                Linux

It would be very nice that a formatted timestamp could be recognised as a valid field of an input data file.
For example:
2010/06/25 12:00:00.000 Par_value1 Par_value2

Field 1 would then be the date, field 2 the time. Not all formatting can be supported, maybe just the possible formatting of the X-axis (C-time, kde, ...)

I'm currently doing pre-processing, to convert the formatted date to a number of seconds. I use the date command for this which apparently takes quite some processing power. The feature might therefore pose some processing problems as well. 

Reproducible: Always

Steps to Reproduce:
N/A

Actual Results:  
N/A

Expected Results:  
N/A

N/A
Comment 1 Nicolas Brisset 2011-12-04 11:42:18 UTC
We have a volunteer to handle this within Google Code-In 2011, see http://www.google-melange.com/gci/task/view/google/gci2011/7120226.
I'm moving the discussion to here and reallocating to the next version (2.0.5) so that we can better keep track of it, attach samples etc.
Comment 2 Nicolas Brisset 2011-12-04 12:06:52 UTC
For this feature we'll need to update:
1) the GUI (ASCII config dialog)
2) the backend code which takes care of reading the data and providing it to kst core

Copying from the message I sent to the list yesterday, here are my current thoughts:
1) GUI:
there we have to ask the user for the minimal input we can't guess. It is a bit tricky since I'd like to support both cases where date and time are together (YYYYMMDD-HHMMSS.SSS, I have such files!) and where they are in 2 vectors (e.g. YYYYMMDD and HHMMSS.SSS in 2 separate vectors). The idea would be to allow having one column which contains time and possibly date, and another optional one in case we have a separate column for the date. Since the drawing code probably considers only one X vector, we would need to merge the two in case they are separate. So in the end we'll receive from the config dialog in the _config member the following options:
   a. (bool) ASCII-formatted time/date
   b. (QString) name of the time/date vector
   c. (QString) format string of the time/date vector, e.g. HH:MM:SS or YYYYMMDD-HHMMSS.SSS
   d. (QString) Meaning of 0, i.e. if 000000.000 in the file actually means 20111203-120000.000 in reality because the experiment started today at noon, then must have a way to let kst know it (I think that was also Matt's request recently)
   e. (bool) Merge separate ASCII-formatted date
   f. (QString) name of the date vector
   g. (QString) format string of the date vector, e.g. YYYYMMDD

2) backend code: in a first step until we have clarified the way we transfer date/time vectors from the datasources to kst core, I'd create internally an array of QDateTime elements based on the provided config options, which we'd return in the double * pointer in readField() when reading the time/date vector indicated in the config dialog. We may have to cast to ints representing the Unix time (time_t type). It is ugly, but we can adapt it later easily, and it should work for at least some formats. Barth, as soon as you know better be sure to let us know. Maybe there is already more appropriate support for time vectors but I'm not familiar with it. One interesting question there could be whether we should cache the time data and construct it directly when parsing the file, to avoid having to manipulate QDateTime objects continuously, which may be heavy. Peter, do you have an opinion on that?
Comment 3 Netterfield 2011-12-05 13:50:37 UTC
-the data source should return time in double precision, like all other vectors.  The plots know how to interpret double precision ctime-epoch seconds, and double precision Julian Day.
-The user should be able to chose between seconds since the start of 1970 ('ctime') or Julian Day.  If the date is before 1970, only Julian Day should be accepted.
-A double precision 'ctime' epoch time has ~1 microsecond precision.  Julian day  is more like a ms.
-Date and Time in separate columns should be merged into a single vector.  I can't see a use case to keep them separate.
-There are lots of time formats!  We need a way to detect/select them.
ISO_8601 should be easily detectable.  Many others, however, are not.  We could use some heuristics to guess defaults (eg, MM <= 12, or dd <= 31, etc), in order to set defaults, and disable some options, but it isn't going to be easy to be accurate in general.

for ISO 8601, see 

http://en.wikipedia.org/wiki/ISO_8601 

In Canada, we have absorbed many different time 'standards' so I am particularly sensitive to this mess.

http://en.wikipedia.org/wiki/Date_and_time_notation_in_Canada
Comment 4 Nicolas Brisset 2011-12-05 21:06:27 UTC
The Code-In student has announced he did not have enough experience to do that:-( 
But let's continue anyway, now that we got it started!

I have thought about it a bit more and it's starting to come together. I need just a bit of (free) time to put my ideas into a UI, which will hopefully be so clear that you (Barth, Peter, Code-In student?) can hack away at the implementation. Basically, I'd suggest something along the lines of (T0 being the time/date of the first sample in the file, if the file counts only relative time and the user wants to display it absolute):
o no specific time treatment
o specific time treatment
   o read as numerical value: KstVectorSelect, interpret as: QComboBox(how to interpret)
   o read as ASCII-formatted string: KstVectorSelect, format: QLineEdit(time format) - T0: QLineEdit(T0)
      x Merge with date info KstVectorSelect, format: QLineEdit(date format)
   o fixed time step: QLineEdit(delta t) - T0: QLineEdit(T0)

I think with that we can cover nicely more or less all the cases, including NI with the last option and minimal copy/pasting from the 100 line preview.
Points to clarify:
1) what else do we need to ask the user?
2) how do we handle the T0 formats? I'm thinking of asking the user to provide strings like YYYYMMDD, which can be used directly with QDateTime
3) how do we transform from QDateTime to the formats mentioned by Barth above?
4) is there a better way to input the formats than QLineEdits? (Like QComboBox with predefined formats + option to provide your own)

I hope we can sort that out shortly, having this sort of support in kst would be awesome and answer many a request we've already had :-)
Comment 5 Nicolas Brisset 2011-12-05 21:08:00 UTC
Hey, just found QTimeEdit and QDateTimeEdit, which sound like good candidates for those T0 inputs!
Comment 6 Nicolas Brisset 2012-02-24 21:23:28 UTC
*** Bug 243445 has been marked as a duplicate of this bug. ***
Comment 7 Nicolas Brisset 2012-02-24 22:04:33 UTC
OK, after a bit more thought here is what I'd suggest. Please check the 2 mockups I'm going to attach and comment. 
The code to write to get that dialog to work correctly requires some thought, which is why the mockups aren't quite right in terms of what widgets are enabled or not. But it seems to cover all reasonable cases I can think of with a limited complexity.
Comment 8 Nicolas Brisset 2012-02-24 22:05:02 UTC
Created attachment 69075 [details]
ASCII time/date config options with time submenu
Comment 9 Nicolas Brisset 2012-02-24 22:05:37 UTC
Created attachment 69076 [details]
ASCII time/date config options with date submenu
Comment 10 Nicolas Brisset 2012-02-24 22:44:22 UTC
Hopefully the added part of the dialog is straightforward enough that everybody understands it. The idea is to ask for:
- the name of the time vector and its formatting
- the name of the date vector (if separate) and its formatting
- the offset to add to the first data line in case the format is not complete (e.g. does not contain the day or the year and the user wants to have it displayed)

Some more explanations and comments:
1) The time options are as follows (for b and c the lineedit should be disabled, otherwise enabled): 
   a) Index, step in s: [value in the lineedit] => use the line number (index) as time, with a fixed delta of x seconds between two lines (with the initial value being the first line offset + the first value)
   b) C Time (int) => interpret the values in the time vector as an integer corresponding to the number of seconds since Jan. 1st 1970, aka the Epoch
   c) Seconds => interpret the values in the time vector as seconds (to add to the first line offset, if provided)
   d) ASCII, format: [format in the lineedit] => transform the ASCII strings found in the column of the time vector using the format template provided by the user. Should be easy to do using QDateTime::fromString()
2) Date options are as follows (lineedit enabled for both):
   a) Index, step in days: [value in the lineedit] => use the line number (index) as day, with a fixed delta of x days between two lines and the initial value being the first line offset + the first value
   b) ASCII, format: [format in the lineedit] => transform the ASCII strings found in the column of the time vector using the format template provided by the user.
3) As discussed above, when both time and date are specified they should be merged into a single vector. We can discuss how to call it, maybe something like "DateTime" or "Date+Time"?
4) If there is no vector for time or date or the user is lazy it should be OK to not select anything, then we would have the current behavior
5) When you change the format options, especially the line from which to read field names, the comboboxes to select time/date vectors should update. Otherwise the only way is to click "Apply as default" at the bottom, close the dialog and reopen it to finish. I doubt many users will have the idea to do that...
6) We may have to validate the values in the lineedits before parsing the data, especially as they should be either numbers or date/time strings according to the selected options and not all values are acceptable. The initial contents (hhmmss.zzz in my example) should change to reflect the kind of value expected, or be empty to avoid creating confusion. I don't know how difficult it is to provide a behavior with an initial value which disappears as you start typing. Possibly a tooltip is good enough and much less effort
7) If time and date are in a single string (like YYYYMMDD-HHMMSS.SSS) the user should select only the first checkbox. Maybe the current labelling does not reflect that very well
8) The third option does not make sense if there is not at least one of the first two
9) If at least one of the first two options is used then Kst should activate automatically the "interpret as time" feature of the X axis: the corresponding vector should have a flag set somewhere saying it is a time vector

Well, this sounds like quite some code to write, but judging by the number of mails we see on the list requesting that I think the feature is really worth it.
Comment 11 Nicolas Brisset 2012-02-24 22:54:56 UTC
Created attachment 69077 [details]
QtDesigner file used for the mockups - ready to be coded :-)
Comment 12 Netterfield 2012-02-25 17:30:45 UTC
A few comments:

How do you specify that columns 1 and 2 form a time/date, and what format they are in?

How do I specify that line 5 contains a time offset and that line 7 contains a dT?

What is the purpose of Date and Time separately?   kst plots do not currently support time of day (but not date) output.  This could be changed if we really need it.  But I can't see a use for date only.
So: datetime and time, with datetime being the most likely scenario.  We could ignore time for now I think.

Basically, there are three types of datetimes in data sources:
 (a) datetime as a numerical value, like JD or ctime in a vector: 
     already supported
 (b) calculable datetime (requires T0, dT, and an index vector)
     these could be manually entered, or could be metadata in the file.
 (c) ASCII specified date as column(s) in the file.  
     There are a number of common formats, and many more uncommon ones.

We should be able to support all three.

(b) types could benefit from scalars in ascii files.
Comment 13 Nicolas Brisset 2012-02-26 22:52:28 UTC
(In reply to comment #12)
> How do you specify that columns 1 and 2 form a time/date, and what format they
> are in?
If you have time in column 1 and date in column 2, you check the two boxes and indicate the format for each. Checking the 2 boxes should automatically create a datetime vector and if possible preselect it for the x vector in the datawizard. As discussed above, it the 2 are required they should be merged.

> How do I specify that line 5 contains a time offset and that line 7 contains a
> dT?
For now, I think manually is enough. You can see the header in the preview pane, just copying the values should be OK.

> What is the purpose of Date and Time separately?   kst plots do not currently
> support time of day (but not date) output.  This could be changed if we really
> need it.  But I can't see a use for date only.
I'm not sure I understand your comment. But regarding date only: image meteorological data with amount of rain per day, or a number of downloads per day (e.g Kst downloads, to visualize the very good trend :-))
In case we have date only, I think using a QDateTime object to convert to a C time internally as currently support would be good enough.

> So: datetime and time, with datetime being the most likely scenario.  We could
> ignore time for now I think.
I don't understand this one...

> Basically, there are three types of datetimes in data sources:
>  (a) datetime as a numerical value, like JD or ctime in a vector: 
>      already supported
>  (b) calculable datetime (requires T0, dT, and an index vector)
>      these could be manually entered, or could be metadata in the file.
>  (c) ASCII specified date as column(s) in the file.  
>      There are a number of common formats, and many more uncommon ones.
> 
> We should be able to support all three.
Agreed, and I think my proposal does just that.
 
> (b) types could benefit from scalars in ascii files.
Correct. One thing we may try in when parsing the header, instead of just creating string metadata: if the lines follow the pattern [sequence of letters]: [number] then create a scalar instead of a string. But I see no easy way to enter that in the same dialog. As mentioned above, I think copying manually is acceptable for that.
Comment 14 Peter Kümmel 2012-10-21 16:25:09 UTC
I've added the reading of all formats supported by QDateTime and QTime.
It needs some better integration into the vector dialog, e.g. times like
17:12:00.123 should not be plotted with a date, because the data also doesn't
defines a date. And setting a time offset would also be very usefully.