Bug 398192 - In Ekos, stopping INDI immediately after disconnecting devices hangs KStars
Summary: In Ekos, stopping INDI immediately after disconnecting devices hangs KStars
Status: RESOLVED FIXED
Alias: None
Product: kstars
Classification: Applications
Component: general (show other bugs)
Version: 3.4.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: TallFurryMan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-03 11:22 UTC by TallFurryMan
Modified: 2020-03-26 06:51 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 3.4.2


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description TallFurryMan 2018-09-03 11:22:55 UTC
Clicking "Disconnect devices" in the Ekos Manager then immediately clicking "Stop INDI" while devices are being disconnected cause KStars to hang.

This is apparently a UI issue, as xorg begins eating a lot of cpu time. Not possible to close windows from the task bar, only solution found is to "killall kstars" from a terminal.

My setup uses a remote INDI server. This situation probably induces slower reactivity with devices, and might give more time for the end-user to click on those two buttons to trigger the symptom.

My setup doesn't have graphics hardware acceleration, so if this is really a UI refresh issue, it will appear more serious on my setup. In any case, KStars is not responsive in this situation, so the load seen on xorg might just be coincidental to a KStars issue.
Comment 1 Jasem Mutlaq 2018-09-03 11:27:49 UTC
I believe this issue is due to a problem shutting down the thread in BaseClient at times. Seen this issue before, but it is rare. Does it get stuck on pthread_join?
Comment 2 MountainAirCA 2020-02-21 07:19:57 UTC
This still happens in KStars 3.4.0, and on the Mac as well.  If you wait (sometimes 20-30 seconds after the last log file entry), then you can stop Indi Server.  If you don't wait long enough (and it varies substantially), the application will hang with a spinning beachball cursor and you ultimately need to Force Quit it.

Because of how many difficulties I have with Indi, this often results in 2-3 crashes per night.  Unfortunately it is not rare; it literally happens to me every observing session at least once.
Comment 3 TallFurryMan 2020-02-22 10:08:41 UTC
Yes, confirmed again.
I just wrote a test to validate this and can reliably reproduce the issue.
Comment 4 Jasem Mutlaq 2020-02-22 10:33:35 UTC
is this about client manager join thread? I rarely see this but it happens. What's the test to run? I cannot reproduce it on my machine and also in a Windows VM.
Comment 5 TallFurryMan 2020-02-22 15:17:34 UTC
Reproduced using indi 1.7.9, and indi git tip. Call is stuck with the following stack:

1   __GI___pthread_timedjoin_ex                                                                                                                                        pthread_join_common.c 142  0x7ffff51e3cd7 
2   std::thread::join()                                                                                                                                                                           0x7ffff4e721d7 
3   INDI::BaseClient::disconnectServer                                                                                                                                 baseclient.cpp        291  0x55555611f6a6 
4   DriverManager::stopDevices                                                                                                                                         drivermanager.cpp     464  0x555555e083a1 
5   Ekos::Manager::cleanDevices                                                                                                                                        manager.cpp           1087 0x55555565716b 
6   Ekos::Manager::stop                                                                                                                                                manager.cpp           532  0x5555556530db 
7   Ekos::Manager::processINDI                                                                                                                                         manager.cpp           527  0x555555653093 
8   QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (Ekos::Manager:: *)()>::call(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h    152  0x555555692808 
9   QtPrivate::FunctionPointer<void (Ekos::Manager:: *)()>::call<QtPrivate::List<>, void>(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *)                       qobjectdefs_impl.h    185  0x55555568dbf1 
10  QtPrivate::QSlotObject<void (Ekos::Manager:: *)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void * *, bool *)                  qobjectdefs_impl.h    414  0x5555556856a6 
11  QMetaObject::activate(QObject *, int, int, void * *)                                                                                                                                          0x7ffff57295c8 
12  QAbstractButton::clicked(bool)                                                                                                                                                                0x7ffff632e236 
13  ??                                                                                                                                                                                            0x7ffff632e45e 
14  ??                                                                                                                                                                                            0x7ffff632f8a3 
15  QAbstractButton::mouseReleaseEvent(QMouseEvent *)                                                                                                                                             0x7ffff632fa65 
16  QWidget::event(QEvent *)                                                                                                                                                                      0x7ffff627c04e 
17  QApplicationPrivate::notify_helper(QObject *, QEvent *)                                                                                                                                       0x7ffff6239a86 
18  QApplication::notify(QObject *, QEvent *)                                                                                                                                                     0x7ffff6243053 
19  QTest::mouseEvent                                                                                                                                                  qtestmouse.h          242  0x55555562bd4a 
20  QTest::mouseEvent                                                                                                                                                  qtestmouse.h          206  0x55555562b852 
... <More>
Comment 6 TallFurryMan 2020-02-22 15:19:14 UTC
I need to make very sure the test is spotting the right problem.
I'll then push a differential with said test for reference.
Comment 7 TallFurryMan 2020-02-22 15:27:42 UTC
In my test, INDI client is still creating properties after 20 seconds, so I'll investigate further before calling the test proper:

1  syscall                                                                                                      syscall.S             38  0x7ffff4b5a94d 
2  QSemaphore::acquire(int)                                                                                                               0x7ffff55390b1 
3  QMetaObject::activate(QObject *, int, int, void * *)                                                                                   0x7ffff57298e5 
4  ClientManager::newINDIProperty                                                                               moc_clientmanager.cpp 346 0x555555db7dc0 
5  ClientManager::newProperty                                                                                   clientmanager.cpp     92  0x555555e18d20 
6  INDI::BaseDevice::buildProp                                                                                  basedevice.cpp        682 0x55555611822d 
7  INDI::BaseClient::dispatchCommand                                                                            baseclient.cpp        584 0x555556120c0c 
8  INDI::BaseClient::listenINDI                                                                                 baseclient.cpp        505 0x5555561203a1 
9  INDI::BaseClient::listenHelper                                                                               baseclient.cpp        374 0x55555611fa9d 
10 std::__invoke_impl<void *, void * ( *)(void *), INDI::BaseClient *>                                          invoke.h              60  0x55555612d55a 
11 std::__invoke<void * ( *)(void *), INDI::BaseClient *>                                                       invoke.h              95  0x55555612d3fd 
12 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::_M_invoke<0ul, 1ul>              thread                244 0x55555612d2cd 
13 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::operator()                       thread                251 0x55555612d259 
14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>>::_M_run thread                195 0x55555612d217 
15 ??                                                                                                                                     0x7ffff4e71f74 
16 start_thread                                                                                                 pthread_create.c      479 0x7ffff51e2669 
17 clone                                                                                                        clone.S               95  0x7ffff4b61323
Comment 8 Jasem Mutlaq 2020-02-23 08:35:43 UTC
I've seen this before but couldn't figure out the root cause :(
Comment 9 TallFurryMan 2020-02-23 10:32:22 UTC
Refining the test, it appears the issue may source from the intrication of signals between gadgets. The code is re-entring the event loop at various places, and if the INDI server is still busy deleting properties, the code locks up in QMetaObject::Activate. Originally I thought it was the way QTest is implemented, which requires lots of deferred calls to manipulate the UI, but it seems I got the test right in the end.

Jasem, I pushed D27593 with the test showing the issue. Remove the delay as specified in the differential to reproduce.
Comment 10 TallFurryMan 2020-02-23 10:37:40 UTC
When the issue is reproduced, this is the stack from the event clicking the "stop" button:

1   __GI___pthread_timedjoin_ex                                                                                                                                        pthread_join_common.c 142  0x7ffff51e3cd7 
2   std::thread::join()                                                                                                                                                                           0x7ffff4e721d7 
3   INDI::BaseClient::disconnectServer                                                                                                                                 baseclient.cpp        291  0x55555611fe0c 
4   DriverManager::stopDevices                                                                                                                                         drivermanager.cpp     464  0x555555e08b07 
5   Ekos::Manager::cleanDevices                                                                                                                                        manager.cpp           1087 0x555555657e19 
6   Ekos::Manager::stop                                                                                                                                                manager.cpp           532  0x555555653d89 
7   Ekos::Manager::processINDI                                                                                                                                         manager.cpp           527  0x555555653d41 
8   QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (Ekos::Manager:: *)()>::call(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *) qobjectdefs_impl.h    152  0x5555556934b6 
9   QtPrivate::FunctionPointer<void (Ekos::Manager:: *)()>::call<QtPrivate::List<>, void>(void (Ekos::Manager:: *)(), Ekos::Manager *, void * *)                       qobjectdefs_impl.h    185  0x55555568e89f 
10  QtPrivate::QSlotObject<void (Ekos::Manager:: *)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void * *, bool *)                  qobjectdefs_impl.h    414  0x555555686354 
11  QMetaObject::activate(QObject *, int, int, void * *)                                                                                                                                          0x7ffff57295c8 
12  QAbstractButton::clicked(bool)                                                                                                                                                                0x7ffff632e236 
13  ??                                                                                                                                                                                            0x7ffff632e45e 
14  ??                                                                                                                                                                                            0x7ffff632f8a3 
15  QAbstractButton::mouseReleaseEvent(QMouseEvent *)                                                                                                                                             0x7ffff632fa65 
16  QWidget::event(QEvent *)                                                                                                                                                                      0x7ffff627c04e 
17  QApplicationPrivate::notify_helper(QObject *, QEvent *)                                                                                                                                       0x7ffff6239a86 
18  QApplication::notify(QObject *, QEvent *)                                                                                                                                                     0x7ffff6243053 
19  QTest::mouseEvent                                                                                                                                                  qtestmouse.h          242  0x55555562bd74 
20  QTest::mouseEvent                                                                                                                                                  qtestmouse.h          206  0x55555562b87c 
... <More>                                                                                                                                                                                                       


This is the stack from the INDI client thread:

1  syscall                                                                                                      syscall.S             38  0x7ffff4b5a94d 
2  QSemaphore::acquire(int)                                                                                                               0x7ffff55390b1 
3  QMetaObject::activate(QObject *, int, int, void * *)                                                                                   0x7ffff57298e5 
4  ClientManager::removeINDIProperty                                                                            moc_clientmanager.cpp 353 0x555555db8598 
5  ClientManager::removeProperty                                                                                clientmanager.cpp     109 0x555555e197e5 
6  INDI::BaseClient::delPropertyCmd                                                                             baseclient.cpp        625 0x5555561216b4 
7  INDI::BaseClient::dispatchCommand                                                                            baseclient.cpp        538 0x555556120d83 
8  INDI::BaseClient::listenINDI                                                                                 baseclient.cpp        505 0x555556120b07 
9  INDI::BaseClient::listenHelper                                                                               baseclient.cpp        374 0x555556120203 
10 std::__invoke_impl<void *, void * ( *)(void *), INDI::BaseClient *>                                          invoke.h              60  0x55555612dcc0 
11 std::__invoke<void * ( *)(void *), INDI::BaseClient *>                                                       invoke.h              95  0x55555612db63 
12 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::_M_invoke<0ul, 1ul>              thread                244 0x55555612da33 
13 std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>::operator()                       thread                251 0x55555612d9bf 
14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void * ( *)(void *), INDI::BaseClient *>>>::_M_run thread                195 0x55555612d97d 
15 ??                                                                                                                                     0x7ffff4e71f74 
16 start_thread                                                                                                 pthread_create.c      479 0x7ffff51e2669 
17 clone                                                                                                        clone.S               95  0x7ffff4b61323 


Two other INDI client threads are waiting on select(). I'm not sure why they are there because I don't know the Ekos client that much, but that's probably minor for this issue.
Comment 11 TallFurryMan 2020-03-25 07:17:24 UTC
Reproduced on my live (not current) system, confirming test is OK:

^C[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
0xb7f5bd09 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f5bd09 in __kernel_vsyscall ()
#1  0xb522a7fe in __GI___pthread_timedjoin_ex (threadid=2750102336, thread_return=0x0, abstime=0x0, block=true) at pthread_join_common.c:89
#2  0xb522a604 in __pthread_join (threadid=2750102336, thread_return=0x0) at pthread_join.c:24
#3  0xb4f0e7b5 in std::thread::join() () at /usr/lib/i386-linux-gnu/libstdc++.so.6
#4  0x00c6d1bf in INDI::BaseClient::disconnectServer() (this=0x5de1f68) at /home/tallfurryman/dev/indinew/libs/indibase/baseclient.cpp:291
#5  0x0072121b in DriverManager::stopDevices(QList<DriverInfo*> const&) (this=0x5679ba0, dList=...)
    at /home/tallfurryman/dev/kstars/kstars/indi/drivermanager.cpp:464
#6  0x007b51ab in Ekos::Manager::cleanDevices(bool) (this=0x2e879b0, stopDrivers=true)
    at /home/tallfurryman/dev/kstars/kstars/ekos/manager.cpp:1017
#7  0x007b525a in Ekos::Manager::stop() (this=0x2e879b0) at /home/tallfurryman/dev/kstars/kstars/ekos/manager.cpp:498
#8  0x00c19160 in EkosAdaptor::stop() (this=0x55b9530) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.cpp:111
#9  0x00c19160 in EkosAdaptor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0x55b9530, _c=QMetaObject::InvokeMetaMethod, _id=13, _a=0xbfb29484) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.moc:210
#10 0x00c19614 in EkosAdaptor::qt_metacall(QMetaObject::Call, int, void**) (this=0x55b9530, _c=QMetaObject::InvokeMetaMethod, _id=13, _a=0xbfb29484) at /home/tallfurryman/dev/kstars_build/kstars/ekosadaptor.moc:314
#11 0xb5ca850c in  () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5
#12 0xb5cad87a in  () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5
#13 0xb5caddd8 in  () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5
#14 0xb5caea06 in  () at /usr/lib/i386-linux-gnu/libQt5DBus.so.5
Comment 12 TallFurryMan 2020-03-25 07:20:00 UTC
Updated version affected to 3.4.0.
Comment 13 Jasem Mutlaq 2020-03-25 15:48:32 UTC
Git commit f8c3d1359ac456449f2a7dc507ca8f4f73a8d406 by Jasem Mutlaq.
Committed on 25/03/2020 at 15:45.
Pushed by mutlaqja into branch 'master'.

Change remoteProperty connection to the default queued connection.
Related: bug 416173
FIXED-IN:3.4.2

M  +15   -16   kstars/ekos/capture/capture.cpp
M  +2    -1    kstars/ekos/capture/capture.h
M  +21   -21   kstars/ekos/guide/guide.cpp
M  +2    -1    kstars/ekos/guide/guide.h
M  +1    -2    kstars/indi/drivermanager.cpp
M  +6    -2    kstars/indi/guimanager.cpp
M  +41   -31   kstars/indi/indidevice.cpp
M  +51   -43   kstars/indi/indidevice.h

https://commits.kde.org/kstars/f8c3d1359ac456449f2a7dc507ca8f4f73a8d406
Comment 14 TallFurryMan 2020-03-25 21:03:06 UTC
Jasem, could you confirm you adjusted the relevant test to verify your change?

Tests/kstars_ui/test_ekos.cpp, in TestEkos::testSimulatorProfile, line 134.

I don't see this in your changelog.
Comment 15 Jasem Mutlaq 2020-03-26 06:51:18 UTC
Everything passed, except this:

XFAIL  : KStarsUiTests::initialConditionsTest() Initial KStars clock is set from system local time, not geolocation, and is untestable for now.
   Loc: [/home/jasem/Projects/kstars/Tests/kstars_ui/kstars_ui_tests.cpp(96)]
XFAIL  : KStarsUiTests::initialConditionsTest() Precision of KStars local time conversion to local time does not allow strict millisecond comparison.
   Loc: [/home/jasem/Projects/kstars/Tests/kstars_ui/kstars_ui_tests.cpp(99)]

Btw, I couldn't get it to crash even before.