Bug 492930

Summary: Race when scheduler goes to next sequence in a group
Product: [Applications] kstars Reporter: Konstantin Baranov <const>
Component: generalAssignee: Wolfgang Reissenberger <wreissen>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version First Reported In: git   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Log
Sequence
Schedule
Analyze of a real session

Description Konstantin Baranov 2024-09-10 05:34:50 UTC
Created attachment 173514 [details]
Log

The exact misbehavior varies depending on machine, devices, etc. For demonstration, I compiled current version from git and setup a local INDI profile with telescope and camera simulators only. Sequence and schedule files are attached.
The sequence is simply 3 exposures. The schedule has 3 different targets sharing one Group name, the same sequence.
I expect to get a clean indefinite loop of 3 images of each target.
Instead, for each iteration, it captures and saves 4 images and starts fifth which is then aborted and reported as an error.

In my real observatory, depending on timing, I may get 3 images within each job captures and actually saved, but the last one is 'aborted'. Scheduler marks job as failed, waits 60s (per configuration), then proceeds to the next job.
Sometimes it is all fine, 3 successful images and immediate jump to the next job.
I did not see this problem with 3.7.0. As I started upgrading to 3.7.2, it appeared.

I don't see this kind of issues with jobs not in a group, even if there are other grouped jobs in the same schedule. Of course, all this story is with Greedy Scheduler enabled.
Comment 1 Konstantin Baranov 2024-09-10 05:35:10 UTC
Created attachment 173515 [details]
Sequence
Comment 2 Konstantin Baranov 2024-09-10 05:35:25 UTC
Created attachment 173516 [details]
Schedule
Comment 3 Konstantin Baranov 2024-09-10 05:40:13 UTC
Created attachment 173517 [details]
Analyze of a real session

Real session with more complicated config. No real errors occurred. All jobs are in a single group. See how some, but not all, of the jobs have the last image 'aborted'.