Bug 406950 - Valgrind reports use after free
Summary: Valgrind reports use after free
Status: VERIFIED FIXED
Alias: None
Product: kstars
Classification: Applications
Component: general (other bugs)
Version First Reported In: 3.2.0
Platform: Mint (Ubuntu based) Linux
: NOR normal
Target Milestone: ---
Assignee: Jasem Mutlaq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-27 04:35 UTC by elladan
Modified: 2019-05-02 13:26 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In: 3.2.2
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description elladan 2019-04-27 04:35:46 UTC
SUMMARY

Was having some stability problems (see also bug #406638) so I decided to run kstars under Valgrind on amd64.

Valgrind reported a large number of troubling uninitialized data errors, but the most critical one I saw after a bit of testing was a clear use-after-free.

STEPS TO REPRODUCE
1. Start kstars under valgrind with some appropriate suppressions.
2. Open EKOS and start indi with simulators. I used a guider as well as a CCD.
3. This error appeared around the time I clicked "capture" in the Guide window, though it seemed somewhat random.

OBSERVED RESULT

org.kde.kstars.ekos.align: "Capturing image..."
org.kde.kstars.fits: Loading FITS file  "/tmp/fitsu28517.fits"
Found one coordinate representation.
org.kde.kstars.ekos.align: "Image received."
org.kde.kstars.ekos.align: "Capturing dark frame..."
org.kde.kstars.fits: Loading FITS file  "/tmp/fitsN28517.fits"
org.kde.kstars.ekos.align: "Dark frame received."
org.kde.kstars.fits: Loading FITS file  "/tmp/fitsN28517.fits"
org.kde.kstars.fits: Saved FITS file: "/home/elladan/.local/share/kstars/darks/darkframe_2019-04-26T18-42-24.fits"
org.kde.kstars.ekos.align: "Dark frame saved to /home/elladan/.local/share/kstars/darks/darkframe_2019-04-26T18-42-24.fits"
org.kde.kstars.ekos.align: "Starting solver..."
==28517== Thread 8 Thread (pooled):
==28517== Invalid write of size 1
==28517==    at 0x35114C: operator() (fitsview.cpp:603)
==28517==    by 0x35114C: QtConcurrent::StoredFunctorCall0<void, bool FITSView::rescale<unsigned short>(FITSZoom)::{lambda()#1}>::runFunctor() (qtconcurrentstoredfunctioncall.h:70)
==28517==    by 0x35017A: QtConcurrent::RunFunctionTask<void>::run() (qtconcurrentrunbase.h:136)
==28517==    by 0xA4392B1: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xA43C17C: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xB5526DA: start_thread (pthread_create.c:463)
==28517==    by 0xC7EE88E: clone (clone.S:95)
==28517==  Address 0x7b56f418 is 730,072 bytes inside a block of size 24,000,000 free'd
==28517==    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28517==    by 0x9886A44: QImageData::~QImageData() (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9886AB6: QImage::~QImage() (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9887C98: QImage::detach() (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9887D26: QImage::scanLine(int) (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x3510F7: operator() (fitsview.cpp:599)
==28517==    by 0x3510F7: QtConcurrent::StoredFunctorCall0<void, bool FITSView::rescale<unsigned short>(FITSZoom)::{lambda()#1}>::runFunctor() (qtconcurrentstoredfunctioncall.h:70)
==28517==    by 0x35017A: QtConcurrent::RunFunctionTask<void>::run() (qtconcurrentrunbase.h:136)
==28517==    by 0xA4392B1: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xA43C17C: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xB5526DA: start_thread (pthread_create.c:463)
==28517==    by 0xC7EE88E: clone (clone.S:95)
==28517==  Block was alloc'd at
==28517==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28517==    by 0x9886D1E: QImageData::create(QSize const&, QImage::Format) (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9886E9A: QImage::QImage(QSize const&, QImage::Format) (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9886ED4: QImage::QImage(int, int, QImage::Format) (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x98874B2: QImage::copy(QRect const&) const (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9887C7E: QImage::detach() (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x9887D26: QImage::scanLine(int) (in /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.5)
==28517==    by 0x3510F7: operator() (fitsview.cpp:599)
==28517==    by 0x3510F7: QtConcurrent::StoredFunctorCall0<void, bool FITSView::rescale<unsigned short>(FITSZoom)::{lambda()#1}>::runFunctor() (qtconcurrentstoredfunctioncall.h:70)
==28517==    by 0x35017A: QtConcurrent::RunFunctionTask<void>::run() (qtconcurrentrunbase.h:136)
==28517==    by 0xA4392B1: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xA43C17C: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==28517==    by 0xB5526DA: start_thread (pthread_create.c:463)
==28517== 

EXPECTED RESULT

Use before free. :-)

SOFTWARE/OS VERSIONS

Kstars: Build: 2019-04-14T19:19:24Z
Linux Mint 19.1 (Tessa)
Comment 1 Jasem Mutlaq 2019-04-27 06:15:34 UTC
Thanks for looking into this. So right using the threading caused some regressions. I'll look into this, since you already probably know more about this than I do, any idea what sequence of event leads to this?

How is the FITSView image freed before rescale? how does this happen?
Comment 2 Jasem Mutlaq 2019-04-30 05:25:42 UTC
Bug #407074 is related.

Any idea what sequence of events leads to use after free? Still trying to pin out the issue so any help would be appreciated.
Comment 3 elladan 2019-04-30 06:12:28 UTC
(In reply to Jasem Mutlaq from comment #2)
> Bug #407074 is related.
> 
> Any idea what sequence of events leads to use after free? Still trying to
> pin out the issue so any help would be appreciated.

I'll attempt a more specific repro as soon as I get a chance.
Comment 4 Jasem Mutlaq 2019-05-01 08:28:33 UTC
Git commit 161d65aa7d10bd7f17b6a7006e8f3bde24f2227f by Jasem Mutlaq.
Committed on 01/05/2019 at 08:25.
Pushed by mutlaqja into branch 'master'.

If previous image is not done loading, wait until the future is completed.

This could fix:
Related: bug 407088, bug 407089, bug 407074

FIXED-IN:3.2.2

M  +2    -0    kstars/fitsviewer/fitsview.cpp

https://commits.kde.org/kstars/161d65aa7d10bd7f17b6a7006e8f3bde24f2227f
Comment 5 elladan 2019-05-02 02:04:58 UTC
I built kstars with your bug fix and then attempted to repro under valgrind. I did not see any use after free errors. Great!

Note that running kstars under valgrind is somewhat frustrating. You do need a release build or it's simply too slow, and I needed a number of suppressions to deal with the 15 million errors generated in various system libraries (especially DRM).

In addition, you need to be very patient with a number of the GUI actions. I found that kstars would often deadlock when doing things like starting or stopping the indi server, and I was forced to kill it (and crashed my window manager) several times. Clicking around while the indi server connection (slowly) processed would certainly put kstars in a bad state.

Here's an example where I had to manually terminate the process (the SEGV is artificial):

==29697== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==29697==    at 0xB333D2D: __pthread_timedjoin_ex (pthread_join_common.c:89)
==29697==    by 0xBC2B8C2: std::thread::join() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==29697==    by 0x8D122E: INDI::BaseClient::disconnectServer() (in /home/elladan/astro/dev/kstars/build/kstars/kstars/kstars)
==29697==    by 0x42D331: DriverManager::stopDevices(QList<DriverInfo*> const&) (drivermanager.cpp:464)
==29697==    by 0x4A40B8: Ekos::Manager::cleanDevices(bool) (manager.cpp:984)
==29697==    by 0x4A4141: Ekos::Manager::stop() (manager.cpp:492)
==29697==    by 0xA42977E: QMetaObject::activate(QObject*, int, int, void**) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.9.5)
==29697==    by 0x8CFEBA1: QAbstractButton::clicked(bool) (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.5)
==29697==    by 0x8CFEDB9: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.5)
==29697==    by 0x8D00199: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.5)
==29697==    by 0x8D0038C: QAbstractButton::mouseReleaseEvent(QMouseEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.5)
==29697==    by 0x8C4C047: QWidget::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.5)
Comment 6 Jasem Mutlaq 2019-05-02 05:03:29 UTC
How about running just the release build (without valgrind). Any stability issues encountered??
Comment 7 Jasem Mutlaq 2019-05-02 05:05:07 UTC
Yes, that is a known issue you reported it sometimes happen in the INDI client library thread. Sometimes it gets stuck on pthread_join as the thread does not exist cleanly. I haven't found what leads to this situation sometimes, it's rare but happens.
Comment 8 elladan 2019-05-02 10:29:54 UTC
(In reply to Jasem Mutlaq from comment #6)
> How about running just the release build (without valgrind). Any stability
> issues encountered??

It's difficult for me to quantify that: the nature of memory corruption is that you experience uncorrelated crashes.

More importantly: I just started using kstars so I can't compare to old versions (which apparently had fewer threads?), but for me 3.x is extremely unstable. I'm not sure I've ever had a session which did not crash.

For example, I attempted to repro Bug #406638 with this bug fix on my Raspberry pi to see if it was related, and I just gave up for now. indi_simulator_guide crashes 100% of the time when I run through my repro sequence on ARM today (it crashes when I capture in the Guide tab). I also experienced kstars crashing but wasn't able to get any debug info from the core.

I also saw at least one crash with this patch on amd64 but didn't acquire any usable debug info there either.

I think the patch improves things, at least.

(In reply to Jasem Mutlaq from comment #7)
> does not exist cleanly. I haven't found what leads to this situation
> sometimes, it's rare but happens.

Have I got a nice solution for you! Run it under valgrind, all sorts of fun race conditions appear. :-)

More generally, I'd suggest adding random time jitter debugging to make these sorts of bugs appear more often. For example, build with a random sleep around lock/unlock calls and then test.
Comment 9 Jasem Mutlaq 2019-05-02 13:26:28 UTC
We made heavy use of threading in 3.x and this is probably the reason why it got unstable (but more responsive I'd say). Hopefully we can tackle down the rest of the issues, any pointers would be appreciated if you found any patterns of interest.