Bug 454866

Summary: QSocketNotifier Invalid socket error
Product: [Applications] kstars Reporter: steve <steve.lcb>
Component: generalAssignee: Jasem Mutlaq <mutlaqja>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 3.5.9   
Target Milestone: ---   
Platform: Kubuntu   
OS: Linux   
Latest Commit: Version Fixed In:

Description steve 2022-06-05 08:48:19 UTC
kstars and EKOS are shut down prematurely


STEPS TO REPRODUCE
1. Leave a capture and guiding job running 
2. 
3. 

OBSERVED RESULT
kstars and EKOS are shut down prematurely after the socket error occurs

EXPECTED RESULT
the jobs are finished

SOFTWARE/OS VERSIONS
kubuntu 20.04.3 fully apt upgraded

ADDITIONAL INFORMATION
The same error occurs using the git sources from today
Comment 1 steve 2022-06-05 08:49:36 UTC
This was also reported to the indilib forum:
https://www.indilib.org/forum/ekos/11784-kstars-qsocketnotifier-invalid-socket-error.html

Thanks
Comment 2 Jasem Mutlaq 2022-06-06 05:14:15 UTC
Any way to reproduce this? I cannot reproduce it here.
Comment 3 steve 2022-06-06 08:14:05 UTC
I'm using a Canon eos700d, eq6 mount and asi120mm guide camera.
The error occurs after taking around 15 to 20 frames (could be more, could be less).
Socket error, so I thought it maybe the vnc server or client so switched from x11vnc to krfb but the session ends similarly with the socket error:


[2022-06-06T00:31:04.179 CEST DEBG ][     org.kde.kstars.ekos.guide] - Capturing frame...
[2022-06-06T00:31:04.197 CEST DEBG ][           org.kde.kstars.indi] - ZWO CCD ASI120MM Mini : "[DEBUG] StartExposure->setexp : 2.500s "
[2022-06-06T00:31:04.281 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected. Exit code: -1
[2022-06-06T00:31:04.281 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "ALIGNDATA" Exit code: -1
[2022-06-06T00:31:04.281 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "HORIZONLIMITSDATAFITS" Exit code: -1
[2022-06-06T00:31:04.281 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "ALIGNMENT_POINT_OPTIONAL_BINARY_BLOB" Exit code: -1
[2022-06-06T00:31:04.604 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "ZWO CCD ASI120MM Mini" Property: "CCD1" Exit code: -1
[2022-06-06T06:47:30.938 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 8 and type 'Read', disabling...
\00\00\00\00\00\00\00[2022-06-06T06:47:30.939 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 24 and type 'Read', disabling...
[2022-06-06T06:47:30.939 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 11 and type 'Read', disabling...
[2022-06-06T06:47:30.942 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 68 and type 'Read', disabling...
[2022-06-06T06:47:30.943 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "Canon DSLR EOS 700D" Property: "CCD1" Exit code: -1

As it worked fine with 3.5.8, any ideas what I could try?
Comment 4 steve 2022-06-06 08:18:21 UTC
There are lots of these in the syslog at the time of the indi disconnection:
usb 3-2: reset high speed USB device number 3 using xhci_hcd

Any pointers?
Comment 5 Jasem Mutlaq 2022-06-06 09:05:19 UTC
No there are separate USB issues. The error begins with INDI server disconnecting

INDI server disconnected. Exit code: -1

No reason was given for this. After the BLOB managers were removed, there was probably another thread trying to send data on the now-invalid socket and that might lead to this crash.
Comment 6 steve 2022-06-06 11:51:10 UTC
Hi
Unfortunately, it happens randomly when capturing and/or guiding but always within an hour of starting the session.
eos700d, eq6, zwo120mm mini

There are lots of these in the syslog. Don't know if it's related:
'
Jun 6 13:30:46 E5430 kernel: [ 550.921174] usb 3-2: reset high-speed USB device number 3 using xhci_hcd

Coinciding with each and every guide exposure.
I've tried the previous asi sdk 1.14.1108 as well as the current 1.22

Tried on a separate computer
Tried replacing cables and using different usb ports
Replaced asi120mm mini with both old asi120 and the t7m asi clone.
Always the same usb error.

At a loss... Is there anything I can try to get rid of the socket error?
TIA
Comment 7 steve 2022-06-06 20:33:33 UTC
Does:

Fix crash on driver restart; a541ab91429cc4f5b1df546fe2a2ce27d10dc900 
https://github.com/KDE/kstars/commit/a541ab91429cc4f5b1df546fe2a2ce27d10dc900

Have any bearing upn this?
Comment 8 Jasem Mutlaq 2022-06-07 04:28:33 UTC
Unlikely.. unless you restarted your driver explicitly. Need to find WHY INDI server gets disconnected exactly.
Comment 9 steve 2022-06-07 07:26:58 UTC
The issue doesn't occur with 3.5.8, so something which changed between then and 3.5.9?
Comment 10 Jasem Mutlaq 2022-06-07 08:08:02 UTC
You can use git bisect to find the offending commit. This would take a while but you can figure out which commit caused this.
Comment 11 steve 2022-06-07 08:39:40 UTC
Can you point me to a method for gir bisect?
Thanks
Comment 12 Jasem Mutlaq 2022-06-07 10:23:50 UTC
There are many guides online on how to git bisect. Basically, you try to go back until the last "good" commit you can find. So maybe when v3.5.8 was released (3369e6afa815a6d1e2089f417eade08d69e668a2) and test with this. If it is good and no crash, mark this as your good commit and mark HEAD as the bad commit, then start git bisect. In each step, you must recompile KStars (just run sudo make -j20 install) and then test. Then continue with good/bad until GIT finds the offending commit.
Comment 13 steve 2022-06-07 12:23:35 UTC
OK
Does this look OK?

steve@E5430:~/Projects/kstars$ git bisect start
steve@E5430:~/Projects/kstars$ git bisect bad
steve@E5430:~/Projects/kstars$ git bisect good 3369e6afa815a6d1e2089f417eade08d69e668a2
Bisecting: 85 revisions left to test after this (roughly 6 steps)
[361d241d3d76b53cc92bc98dbb237c4fe6d79000] Use upstream CMake config for Eigen

What does 'Use upstream CMake config for Eigen' mean?
Comment 14 steve 2022-06-07 13:14:35 UTC
Just seen that it's the name of the commit which git bisect checked out. DUH!
Comment 15 steve 2022-06-07 23:31:38 UTC
OK. That checkout gave the same error:

[2022-06-08T01:03:56.394 CEST DEBG ][           org.kde.kstars.indi] - ZWO CCD ASI120MM Mini : "[DEBUG] Upload complete "
[2022-06-08T01:03:56.410 CEST DEBG ][           org.kde.kstars.indi] - Image received. Mode: "Guide" Size: 1235520
[2022-06-08T01:03:56.410 CEST DEBG ][           org.kde.kstars.fits] - Reading file buffer ( "1,2 MiB" )
[2022-06-08T01:03:56.521 CEST DEBG ][     org.kde.kstars.ekos.guide] - Received guide frame.
[2022-06-08T01:03:56.522 CEST DEBG ][     org.kde.kstars.ekos.guide] - Multistar: findTopStars 250
[2022-06-08T01:03:56.522 CEST DEBG ][           org.kde.kstars.fits] - Sextract with:  "1-Guide-Default"
[2022-06-08T01:03:56.563 CEST FATL ][                       default] - ASSERT: "isDetached()" in file /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h, line 392
[2022-06-08T01:03:56.564 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 6 and type 'Read', disabling...
[2022-06-08T01:03:56.564 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 45 and type 'Read', disabling...
[2022-06-08T01:03:56.564 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 43 and type 'Read', disabling...
[2022-06-08T01:03:56.564 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 65 and type 'Read', disabling...
[2022-06-08T01:03:56.564 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 49 and type 'Read', disabling...
[2022-06-08T01:03:56.595 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 66 and type 'Read', disabling...
[2022-06-08T01:03:56.595 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 68 and type 'Read', disabling...
[2022-06-08T01:03:56.595 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 67 and type 'Exception', disabling...
[2022-06-08T01:03:56.872 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "ZWO CCD ASI120MM Mini" Property: "CCD1" Exit code: -1
[2022-06-08T01:03:56.872 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected. Exit code: -1
[2022-06-08T01:03:57.088 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "ALIGNDATA" Exit code: -1
[2022-06-08T01:03:57.088 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "HORIZONLIMITSDATAFITS" Exit code: -1
[2022-06-08T01:03:57.089 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "ALIGNMENT_POINT_OPTIONAL_BINARY_BLOB" Exit code: -1
\00\00\00\00\00\00\00[2022-06-08T01:14:25.862 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 8 and type 'Read', disabling...
[2022-06-08T01:14:25.862 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 23 and type 'Read', disabling...
[2022-06-08T01:14:25.862 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 11 and type 'Read', disabling...
[2022-06-08T01:14:25.865 CEST DEBG ][           org.

So ran git bisect again:

git bisect bad
Bisecting: 33 revisions left to test after this (roughly 5 steps)
[66ac08e5250adb64b2e3b63031c09aedde270fa8] use captureStates directly to send the exact capture status string
git bisect bad
Bisecting: 33 revisions left to test after this (roughly 5 steps)
[66ac08e5250adb64b2e3b63031c09aedde270fa8] use captureStates directly to send the exact capture status string
bad. So:

Now to compile and test again...
I think...???
Comment 16 Jasem Mutlaq 2022-06-08 07:35:10 UTC
Yes need to compile and test after each step.
Comment 17 steve 2022-06-08 09:17:27 UTC
Yep. OK. I think I've understood.

Here's the next one:
 git bisect good
Bisecting: 16 revisions left to test after this (roughly 4 steps)
[3d989ff71a2d1037fdcf0dd6fb2e0aafab144f6d] Fix broken checkbox label and tooltip.
Comment 18 steve 2022-06-11 05:16:11 UTC
For this bisect checkout, the indi server survived but still crashed upon:

[2022-06-11T04:16:40.210 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 76 and type 'Read', disabling...

Anyway, 2 bisect steps left.
Comment 19 steve 2022-06-13 01:52:43 UTC
 git bisect bad
b028a6e607f64cb0f7eac284732e5b12c6df366d is the first bad commit
commit b028a6e607f64cb0f7eac284732e5b12c6df366d
Author: Hy Murveit <hy-2@murveit.com>
Date:   Tue Apr 12 07:24:03 2022 +0000

    Add Target Distance to the Analyze graph/stats
    
    Adds the Target Distance statistic calculated by the scheduler to Analyze: how far the most recent capture is from the target position (in arc-seconds). Does this by adding a scheduler signal (targetDistance) which is received and processed by analyze.
    
    This also includes adding the target distance to the capture summary tab (commit by @wreissenberger).

 kstars/ekos/analyze/analyze.cpp               | 40 +++++++++++
 kstars/ekos/analyze/analyze.h                 |  4 ++
 kstars/ekos/analyze/analyze.ui                | 48 ++++++++++++++
 kstars/ekos/capture/capture.cpp               | 31 +++++++++
 kstars/ekos/capture/capture.h                 |  6 ++
 kstars/ekos/capture/capture.ui                | 35 +++++++++-
 kstars/ekos/capture/capturepreviewwidget.cpp  | 23 ++++---
 kstars/ekos/capture/capturepreviewwidget.h    |  6 ++
 kstars/ekos/capture/captureprocessoverlay.cpp | 15 +++++
 kstars/ekos/capture/captureprocessoverlay.h   |  8 +++
 kstars/ekos/capture/captureprocessoverlay.ui  | 96 +++++++++++++++++----------
 kstars/ekos/manager.cpp                       |  9 ++-
 kstars/ekos/scheduler/scheduler.cpp           |  1 +
 kstars/ekos/scheduler/scheduler.h             |  3 +
 kstars/kstars.kcfg                            |  4 ++
 15 files changed, 283 insertions(+), 46 deletions(-)
Comment 20 steve 2022-06-13 02:16:52 UTC
log corresponding to bad git bisect here:
https://drive.google.com/file/d/14nhEY_x8oLKdVzk54Mg67o2VUWsqyzCn/view?usp=sharing

Thanks
Comment 21 Jasem Mutlaq 2022-06-13 04:36:33 UTC
Are you 100% positive this is the bad commit? Can you try to checkout the previous commit and see if it causes the crash? just to be sure.
Comment 22 steve 2022-06-13 06:01:45 UTC
This is the bad commit according to git bisect, not myself. 
Please could you help by telling me which is the previous commit you would like testing?
Thanks.
Comment 23 steve 2022-06-13 09:38:56 UTC
OK. Found it in the git log. The previous commit is this one:
3d989ff71a2d1037fdcf0dd6fb2e0aafab144f6d
Will try.
Comment 24 steve 2022-06-13 14:35:35 UTC
That commit has already proven to be good.

here is the bisect log:

git bisect start
# bad: [ec5c439a01b81aa93fd67abe6fee370e3846a005] For lightbox, we must check target state since for dark frame it must turned off and not kept on which results in infinite loop
git bisect bad ec5c439a01b81aa93fd67abe6fee370e3846a005
# good: [3369e6afa815a6d1e2089f417eade08d69e668a2] KStars stable v3.5.8 release
git bisect good 3369e6afa815a6d1e2089f417eade08d69e668a2
# bad: [361d241d3d76b53cc92bc98dbb237c4fe6d79000] Use upstream CMake config for Eigen
git bisect bad 361d241d3d76b53cc92bc98dbb237c4fe6d79000
# good: [66ac08e5250adb64b2e3b63031c09aedde270fa8] use captureStates directly to send the exact capture status string
git bisect good 66ac08e5250adb64b2e3b63031c09aedde270fa8
# good: [3d989ff71a2d1037fdcf0dd6fb2e0aafab144f6d] Fix broken checkbox label and tooltip.
git bisect good 3d989ff71a2d1037fdcf0dd6fb2e0aafab144f6d
# bad: [b843bc9a8409215704ab169f9856eed0984533c9] Respect None setting
git bisect bad b843bc9a8409215704ab169f9856eed0984533c9
# bad: [3df00202fae28e3990590d041086baa616c8d030] Add support for HiPS offline storage. Reduce unnecessary significant...
git bisect bad 3df00202fae28e3990590d041086baa616c8d030
# bad: [752ed710d371bd59950ba6810449efc5a1171258] Prevent destructor from starting when fitsview is being updated
git bisect bad 752ed710d371bd59950ba6810449efc5a1171258
# bad: [b028a6e607f64cb0f7eac284732e5b12c6df366d] Add Target Distance to the Analyze graph/stats
git bisect bad b028a6e607f64cb0f7eac284732e5b12c6df366d
# first bad commit: [b028a6e607f64cb0f7eac284732e5b12c6df366d] Add Target Distance to the Analyze graph/stats
Comment 25 steve 2022-06-19 04:19:28 UTC
Hi
Any news on this bug?
Thanks
Comment 26 steve 2022-06-19 07:12:02 UTC
Do you think this may help?
https://www.indilib.org/forum/development/11843-fast-blobs.html
Thanks.
Comment 27 steve 2022-06-19 22:36:22 UTC
This is the kstars git built with the latest INDI as requested testing from here:
https://www.indilib.org/forum/development/11843-fast-blobs.html

It still crashes with the same indi disconnect error.

log:
https://drive.google.com/file/d/1MpM6qfqNJBhJuWuzNPHIlhupwDLX3WgO/view?usp=sharing

Anything else I could try?
Comment 28 steve 2022-06-20 08:23:20 UTC
The qsocketnotifier error and indi disconnect seem to be precipitated after a Sextract:


[2022-06-20T00:20:42.763 CEST DEBG ][     org.kde.kstars.ekos.guide] - Received guide frame.
[2022-06-20T00:20:42.763 CEST DEBG ][     org.kde.kstars.ekos.guide] - Multistar: findTopStars 250
[2022-06-20T00:20:42.764 CEST DEBG ][           org.kde.kstars.fits] - Sextract with:  "1-Guide-Default"
[2022-06-20T00:20:42.815 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 6 and type 'Read', disabling...
[2022-06-20T00:20:42.815 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 47 and type 'Read', disabling...
[2022-06-20T00:20:42.815 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 45 and type 'Read', disabling...
[2022-06-20T00:20:42.815 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 68 and type 'Read', disabling...
[2022-06-20T00:20:42.815 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 51 and type 'Read', disabling...
[2022-06-20T00:20:42.831 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 69 and type 'Read', disabling...
[2022-06-20T00:20:42.831 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 70 and type 'Exception', disabling...
[2022-06-20T00:20:42.831 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 71 and type 'Read', disabling...
[2022-06-20T00:20:42.989 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "ZWO CCD ASI120MM Mini" Property: "CCD1" Exit code: -1
[2022-06-20T00:20:42.989 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected. Exit code: -1
Comment 29 steve 2022-06-20 21:57:48 UTC
This is with the latest indi:
commit 3d63e3be5008605ce04621fb796f1a9b32c88a86 (HEAD -> master, origin/master, origin/HEAD)
Author: Jasem Mutlaq <mutlaqja@ikarustech.com>
Date:   Sat Jun 18 20:00:15 2022 +0300

    Use IDSharedBlobAlloc and IDSharedBlobRealloc as suggested by Ludovic

Latest 3rd-party asi, gphot and eqmod:
commit 3615b54473761d3a1067a312ae1a6a888a9fc370 (HEAD -> master, origin/master, origin/HEAD)
Author: Radek Kaczorek <rkaczorek@gmail.com>
Date:   Sat Jun 18 11:14:49 2022 +0200

    Bumping debian compatibility levels to 10 (#590)


 and the latest kstars git:
commit 11846aa76610e4231a735000e4289f28fe906c5a (HEAD -> master, origin/master, origin/HEAD)
Author: Jasem Mutlaq <mutlaqja@ikarustech.com>
Date:   Mon Jun 20 11:51:58 2022 +0300

    Send updated error to EkosLive
--- --- ---

For this test, I used a different guide camera and cable:
The log ends similarly to the session posted yesterday with the crash and kstars disappearing:

[2022-06-20T23:37:30.672 CEST DEBG ][     org.kde.kstars.ekos.guide] - Received guide frame.
[2022-06-20T23:37:30.672 CEST DEBG ][     org.kde.kstars.ekos.guide] - Multistar: findTopStars 250
[2022-06-20T23:37:30.673 CEST DEBG ][           org.kde.kstars.fits] - Sextract with:  "1-Guide-Default"
[2022-06-20T23:37:30.716 CEST FATL ][                       default] - ASSERT: "isDetached()" in file /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h, line 392
[2022-06-20T23:37:30.717 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 6 and type 'Read', disabling...
[2022-06-20T23:37:30.717 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 44 and type 'Read', disabling...
[2022-06-20T23:37:30.717 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 42 and type 'Read', disabling...
[2022-06-20T23:37:30.717 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 67 and type 'Read', disabling...
[2022-06-20T23:37:30.718 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 48 and type 'Read', disabling...
[2022-06-20T23:37:30.732 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 68 and type 'Read', disabling...
[2022-06-20T23:37:30.732 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 69 and type 'Exception', disabling...
[2022-06-20T23:37:30.732 CEST WARN ][                       default] - QSocketNotifier: Invalid socket 70 and type 'Read', disabling...
[2022-06-20T23:37:30.976 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected from BLOB manager for Device: "EQMod Mount" Property: "ALIGNDATA" Exit code: -1
[2022-06-20T23:37:30.976 CEST DEBG ][           org.kde.kstars.indi] - INDI server disconnected. Exit code: -1
Comment 30 steve 2022-06-30 08:36:35 UTC
Works OK with phd2 which does NOT call sextract.
Tested over 2 sessions of 5 hours guiding with phd2. 
logs etc: 
https://drive.google.com/drive/folders/1B-OOitN-K5OCmUHoXOgwCXh1Lq3ZPNiG?usp=sharing
Comment 31 steve 2022-08-18 11:49:32 UTC
Hi everyone
This is fixed by turning off stellarsolver partitioning. The latter is now default as of kstars 3.6.0

**does this count as RESOLVED?