492930 – Race when scheduler goes to next sequence in a group

Bug 492930 - Race when scheduler goes to next sequence in a group

Summary: Race when scheduler goes to next sequence in a group

Status:	REPORTED

Alias:	None

Product:	kstars
Classification:	Applications
Component:	general (other bugs)
Version First Reported In:	git
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Wolfgang Reissenberger

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-09-10 05:34 UTC by Konstantin Baranov
Modified:	2024-09-10 07:45 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Log (128.79 KB, text/plain) 2024-09-10 05:34 UTC, Konstantin Baranov	Details
Sequence (1.17 KB, application/xml) 2024-09-10 05:35 UTC, Konstantin Baranov	Details
Schedule (2.09 KB, application/xml) 2024-09-10 05:35 UTC, Konstantin Baranov	Details
Analyze of a real session (277.91 KB, text/plain) 2024-09-10 05:40 UTC, Konstantin Baranov	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Konstantin Baranov 2024-09-10 05:34:50 UTC

Created attachment 173514 [details]
Log

The exact misbehavior varies depending on machine, devices, etc. For demonstration, I compiled current version from git and setup a local INDI profile with telescope and camera simulators only. Sequence and schedule files are attached.
The sequence is simply 3 exposures. The schedule has 3 different targets sharing one Group name, the same sequence.
I expect to get a clean indefinite loop of 3 images of each target.
Instead, for each iteration, it captures and saves 4 images and starts fifth which is then aborted and reported as an error.

In my real observatory, depending on timing, I may get 3 images within each job captures and actually saved, but the last one is 'aborted'. Scheduler marks job as failed, waits 60s (per configuration), then proceeds to the next job.
Sometimes it is all fine, 3 successful images and immediate jump to the next job.
I did not see this problem with 3.7.0. As I started upgrading to 3.7.2, it appeared.

I don't see this kind of issues with jobs not in a group, even if there are other grouped jobs in the same schedule. Of course, all this story is with Greedy Scheduler enabled.

Comment 1 Konstantin Baranov 2024-09-10 05:35:10 UTC

Created attachment 173515 [details]
Sequence

Comment 2 Konstantin Baranov 2024-09-10 05:35:25 UTC

Created attachment 173516 [details]
Schedule

Comment 3 Konstantin Baranov 2024-09-10 05:40:13 UTC

Created attachment 173517 [details]
Analyze of a real session

Real session with more complicated config. No real errors occurred. All jobs are in a single group. See how some, but not all, of the jobs have the last image 'aborted'.