Clicking "Disconnect devices" in the Ekos Manager then immediately clicking "Stop INDI" while devices are being disconnected cause KStars to hang. This is apparently a UI issue, as xorg begins eating a lot of cpu time. Not possible to close windows from the task bar, only solution found is to "killall kstars" from a terminal. My setup uses a remote INDI server. This situation probably induces slower reactivity with devices, and might give more time for the end-user to click on those two buttons to trigger the symptom. My setup doesn't have graphics hardware acceleration, so if this is really a UI refresh issue, it will appear more serious on my setup. In any case, KStars is not responsive in this situation, so the load seen on xorg might just be coincidental to a KStars issue.
I believe this issue is due to a problem shutting down the thread in BaseClient at times. Seen this issue before, but it is rare. Does it get stuck on pthread_join?
This still happens in KStars 3.4.0, and on the Mac as well. If you wait (sometimes 20-30 seconds after the last log file entry), then you can stop Indi Server. If you don't wait long enough (and it varies substantially), the application will hang with a spinning beachball cursor and you ultimately need to Force Quit it. Because of how many difficulties I have with Indi, this often results in 2-3 crashes per night. Unfortunately it is not rare; it literally happens to me every observing session at least once.
Yes, confirmed again. I just wrote a test to validate this and can reliably reproduce the issue.
is this about client manager join thread? I rarely see this but it happens. What's the test to run? I cannot reproduce it on my machine and also in a Windows VM.
Reproduced using indi 1.7.9, and indi git tip. Call is stuck with the following stack: 1 __GI___pthread_timedjoin_ex pthread_join_common.c 142 0x7ffff51e3cd7 2 std::thread::join() 0x7ffff4e721d7 3 INDI::BaseClient::disconnectServer baseclient.cpp 291 0x55555611f6a6 4 DriverManager::stopDevices drivermanager.cpp 464 0x555555e083a1 5 Ekos::Manager::cleanDevices manager.cpp 1087 0x55555565716b 6 Ekos::Manager::stop manager.cpp 532 0x5555556530db 7 Ekos::Manager::processINDI manager.cpp 527 0x555555653093 8 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (Ekos::Manager:: *)()>::call(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h 152 0x555555692808 9 QtPrivate::FunctionPointer<void (Ekos::Manager:: *)()>::call<QtPrivate::List<>, void>(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h 185 0x55555568dbf1 10 QtPrivate::QSlotObject<void (Ekos::Manager:: *)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void * *, bool *) qobjectdefs_impl.h 414 0x5555556856a6 11 QMetaObject::activate(QObject *, int, int, void * *) 0x7ffff57295c8 12 QAbstractButton::clicked(bool) 0x7ffff632e236 13 ?? 0x7ffff632e45e 14 ?? 0x7ffff632f8a3 15 QAbstractButton::mouseReleaseEvent(QMouseEvent *) 0x7ffff632fa65 16 QWidget::event(QEvent *) 0x7ffff627c04e 17 QApplicationPrivate::notify_helper(QObject *, QEvent *) 0x7ffff6239a86 18 QApplication::notify(QObject *, QEvent *) 0x7ffff6243053 19 QTest::mouseEvent qtestmouse.h 242 0x55555562bd4a 20 QTest::mouseEvent qtestmouse.h 206 0x55555562b852 ... <More>
I need to make very sure the test is spotting the right problem. I'll then push a differential with said test for reference.
In my test, INDI client is still creating properties after 20 seconds, so I'll investigate further before calling the test proper: 1 syscall syscall.S 38 0x7ffff4b5a94d 2 QSemaphore::acquire(int) 0x7ffff55390b1 3 QMetaObject::activate(QObject *, int, int, void * *) 0x7ffff57298e5 4 ClientManager::newINDIProperty moc_clientmanager.cpp 346 0x555555db7dc0 5 ClientManager::newProperty clientmanager.cpp 92 0x555555e18d20 6 INDI::BaseDevice::buildProp basedevice.cpp 682 0x55555611822d 7 INDI::BaseClient::dispatchCommand baseclient.cpp 584 0x555556120c0c 8 INDI::BaseClient::listenINDI baseclient.cpp 505 0x5555561203a1 9 INDI::BaseClient::listenHelper baseclient.cpp 374 0x55555611fa9d 10 std::__invoke_impl<void *, void * ( *)(void *), INDI::BaseClient *> invoke.h 60 0x55555612d55a 11 std::__invoke<void * ( *)(void *), INDI::BaseClient *> invoke.h 95 0x55555612d3fd 12 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::_M_invoke<0ul, 1ul> thread 244 0x55555612d2cd 13 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::operator() thread 251 0x55555612d259 14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>>::_M_run thread 195 0x55555612d217 15 ?? 0x7ffff4e71f74 16 start_thread pthread_create.c 479 0x7ffff51e2669 17 clone clone.S 95 0x7ffff4b61323
I've seen this before but couldn't figure out the root cause :(
Refining the test, it appears the issue may source from the intrication of signals between gadgets. The code is re-entring the event loop at various places, and if the INDI server is still busy deleting properties, the code locks up in QMetaObject::Activate. Originally I thought it was the way QTest is implemented, which requires lots of deferred calls to manipulate the UI, but it seems I got the test right in the end. Jasem, I pushed D27593 with the test showing the issue. Remove the delay as specified in the differential to reproduce.
When the issue is reproduced, this is the stack from the event clicking the "stop" button: 1 __GI___pthread_timedjoin_ex pthread_join_common.c 142 0x7ffff51e3cd7 2 std::thread::join() 0x7ffff4e721d7 3 INDI::BaseClient::disconnectServer baseclient.cpp 291 0x55555611fe0c 4 DriverManager::stopDevices drivermanager.cpp 464 0x555555e08b07 5 Ekos::Manager::cleanDevices manager.cpp 1087 0x555555657e19 6 Ekos::Manager::stop manager.cpp 532 0x555555653d89 7 Ekos::Manager::processINDI manager.cpp 527 0x555555653d41 8 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (Ekos::Manager:: *)()>::call(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h 152 0x5555556934b6 9 QtPrivate::FunctionPointer<void (Ekos::Manager:: *)()>::call<QtPrivate::List<>, void>(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h 185 0x55555568e89f 10 QtPrivate::QSlotObject<void (Ekos::Manager:: *)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void * *, bool *) qobjectdefs_impl.h 414 0x555555686354 11 QMetaObject::activate(QObject *, int, int, void * *) 0x7ffff57295c8 12 QAbstractButton::clicked(bool) 0x7ffff632e236 13 ?? 0x7ffff632e45e 14 ?? 0x7ffff632f8a3 15 QAbstractButton::mouseReleaseEvent(QMouseEvent *) 0x7ffff632fa65 16 QWidget::event(QEvent *) 0x7ffff627c04e 17 QApplicationPrivate::notify_helper(QObject *, QEvent *) 0x7ffff6239a86 18 QApplication::notify(QObject *, QEvent *) 0x7ffff6243053 19 QTest::mouseEvent qtestmouse.h 242 0x55555562bd74 20 QTest::mouseEvent qtestmouse.h 206 0x55555562b87c ... <More> This is the stack from the INDI client thread: 1 syscall syscall.S 38 0x7ffff4b5a94d 2 QSemaphore::acquire(int) 0x7ffff55390b1 3 QMetaObject::activate(QObject *, int, int, void * *) 0x7ffff57298e5 4 ClientManager::removeINDIProperty moc_clientmanager.cpp 353 0x555555db8598 5 ClientManager::removeProperty clientmanager.cpp 109 0x555555e197e5 6 INDI::BaseClient::delPropertyCmd baseclient.cpp 625 0x5555561216b4 7 INDI::BaseClient::dispatchCommand baseclient.cpp 538 0x555556120d83 8 INDI::BaseClient::listenINDI baseclient.cpp 505 0x555556120b07 9 INDI::BaseClient::listenHelper baseclient.cpp 374 0x555556120203 10 std::__invoke_impl<void *, void * ( *)(void *), INDI::BaseClient *> invoke.h 60 0x55555612dcc0 11 std::__invoke<void * ( *)(void *), INDI::BaseClient *> invoke.h 95 0x55555612db63 12 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::_M_invoke<0ul, 1ul> thread 244 0x55555612da33 13 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::operator() thread 251 0x55555612d9bf 14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>>::_M_run thread 195 0x55555612d97d 15 ?? 0x7ffff4e71f74 16 start_thread pthread_create.c 479 0x7ffff51e2669 17 clone clone.S 95 0x7ffff4b61323 Two other INDI client threads are waiting on select(). I'm not sure why they are there because I don't know the Ekos client that much, but that's probably minor for this issue.
Reproduced on my live (not current) system, confirming test is OK: ^C[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1". 0xb7f5bd09 in __kernel_vsyscall () (gdb) bt #0 0xb7f5bd09 in __kernel_vsyscall () #1 0xb522a7fe in __GI___pthread_timedjoin_ex (threadid=2750102336, thread_return=0x0, abstime=0x0, block=true) at pthread_join_common.c:89 #2 0xb522a604 in __pthread_join (threadid=2750102336, thread_return=0x0) at pthread_join.c:24 #3 0xb4f0e7b5 in std::thread::join() () at /usr/lib/i386-linux-gnu/libstdc++.so.6 #4 0x00c6d1bf in INDI::BaseClient::disconnectServer() (this=0x5de1f68) at /home/tallfurryman/dev/indinew/libs/indibase/baseclient.cpp:291 #5 0x0072121b in DriverManager::stopDevices(QList<DriverInfo*> const&) (this=0x5679ba0, dList=...) at /home/tallfurryman/dev/kstars/kstars/indi/drivermanager.cpp:464 #6 0x007b51ab in Ekos::Manager::cleanDevices(bool) (this=0x2e879b0, stopDrivers=true) at /home/tallfurryman/dev/kstars/kstars/ekos/manager.cpp:1017 #7 0x007b525a in Ekos::Manager::stop() (this=0x2e879b0) at /home/tallfurryman/dev/kstars/kstars/ekos/manager.cpp:498 #8 0x00c19160 in EkosAdaptor::stop() (this=0x55b9530) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.cpp:111 #9 0x00c19160 in EkosAdaptor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0x55b9530, _c=QMetaObject::InvokeMetaMethod, _id=13, _a=0xbfb29484) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.moc:210 #10 0x00c19614 in EkosAdaptor::qt_metacall(QMetaObject::Call, int, void**) (this=0x55b9530, _c=QMetaObject::InvokeMetaMethod, _id=13, _a=0xbfb29484) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.moc:314 #11 0xb5ca850c in () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5 #12 0xb5cad87a in () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5 #13 0xb5caddd8 in () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5 #14 0xb5caea06 in () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5
Updated version affected to 3.4.0.
Git commit f8c3d1359ac456449f2a7dc507ca8f4f73a8d406 by Jasem Mutlaq. Committed on 25/03/2020 at 15:45. Pushed by mutlaqja into branch 'master'. Change remoteProperty connection to the default queued connection. Related: bug 416173 FIXED-IN:3.4.2 M +15 -16 kstars/ekos/capture/capture.cpp M +2 -1 kstars/ekos/capture/capture.h M +21 -21 kstars/ekos/guide/guide.cpp M +2 -1 kstars/ekos/guide/guide.h M +1 -2 kstars/indi/drivermanager.cpp M +6 -2 kstars/indi/guimanager.cpp M +41 -31 kstars/indi/indidevice.cpp M +51 -43 kstars/indi/indidevice.h https://commits.kde.org/kstars/f8c3d1359ac456449f2a7dc507ca8f4f73a8d406
Jasem, could you confirm you adjusted the relevant test to verify your change? Tests/kstars_ui/test_ekos.cpp, in TestEkos::testSimulatorProfile, line 134. I don't see this in your changelog.
Everything passed, except this: XFAIL : KStarsUiTests::initialConditionsTest() Initial KStars clock is set from system local time, not geolocation, and is untestable for now. Loc: [/home/jasem/Projects/kstars/Tests/kstars_ui/kstars_ui_tests.cpp(96)] XFAIL : KStarsUiTests::initialConditionsTest() Precision of KStars local time conversion to local time does not allow strict millisecond comparison. Loc: [/home/jasem/Projects/kstars/Tests/kstars_ui/kstars_ui_tests.cpp(99)] Btw, I couldn't get it to crash even before.