Bug 385386 - Assertion failed "szB >= CACHE_ENTRY_SIZE" on m_debuginfo/image.c:517
Summary: Assertion failed "szB >= CACHE_ENTRY_SIZE" on m_debuginfo/image.c:517
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.13.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-05 07:37 UTC by Pedro Ferreira
Modified: 2020-01-22 09:34 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pedro Ferreira 2017-10-05 07:37:35 UTC
While attempting to run "kwrite" (from KDE), valgrind crashes with the message: 

valgrind: m_debuginfo/image.c:517 (realloc_CEnt): Assertion 'szB >= CACHE_ENTRY_SIZE' failed.

host stacktrace:
==15349==    at 0x58087103: show_sched_status_wrk (m_libcassert.c:355)
==15349==    by 0x58087204: report_and_quit (m_libcassert.c:426)
==15349==    by 0x58087399: vgPlain_assert_fail (m_libcassert.c:492)
==15349==    by 0x58128C0A: realloc_CEnt (image.c:517)
==15349==    by 0x58128C0A: get_slowcase (image.c:773)
==15349==    by 0x58128DD7: get (image.c:816)
==15349==    by 0x58128DD7: vgModuleLocal_img_get (image.c:1088)
==15349==    by 0x58128EC8: vgModuleLocal_img_get_ULong (image.c:1188)
==15349==    by 0x581303B3: get_ULong (readdwarf3.c:285)
==15349==    by 0x581307D2: get_UWord (readdwarf3.c:352)
==15349==    by 0x581307D2: make_general_GX (readdwarf3.c:691)
==15349==    by 0x58136212: parse_var_DIE (readdwarf3.c:2296)
==15349==    by 0x58136CF6: read_DIE (readdwarf3.c:4219)
==15349==    by 0x58136E96: read_DIE (readdwarf3.c:4280)
==15349==    by 0x58136E96: read_DIE (readdwarf3.c:4280)
==15349==    by 0x5813781D: new_dwarf3_reader_wrk.constprop.31 (readdwarf3.c:4757)
==15349==    by 0x5813993F: vgModuleLocal_new_dwarf3_reader (readdwarf3.c:5200)
==15349==    by 0x580C5D6B: vgModuleLocal_read_elf_debug_info (readelf.c:3111)
==15349==    by 0x580B91BA: di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:748)
==15349==    by 0x580B91BA: vgPlain_di_notify_mmap (debuginfo.c:1063)
==15349==    by 0x580E2FBD: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2388)
==15349==    by 0x58117BF1: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:400)
==15349==    by 0x580DFA5A: vgPlain_client_syscall (syswrap-main.c:1857)
==15349==    by 0x580DC61A: handle_syscall (scheduler.c:1126)
==15349==    by 0x580DDB2E: vgPlain_scheduler (scheduler.c:1443)
==15349==    by 0x580ED146: thread_wrapper (syswrap-linux.c:103)
==15349==    by 0x580ED146: run_a_thread_NORETURN (syswrap-linux.c:156)


Kwrite (and a couple other components) were built with -O0 through Gentoo's emerge (I was looking for a separate bug).
I'm using Gentoo "Valgrind-3.13.0 and LibVEX" on x86_64.
Comment 1 Pedro Ferreira 2017-10-05 07:39:45 UTC
Just prior to the crash, the last log message was:

--15349-- Reading syms from /usr/lib64/libQt5Qml.so.5.7.1
--15349--   Considering /usr/lib/debug/usr/lib64/libQt5Qml.so.5.7.1.debug ..
--15349--   .. CRC is valid

Would attaching the debug symbols file help?
Comment 2 Julian Seward 2018-07-28 21:22:52 UTC
I can't imagine how this failed.  Can you still reproduce it?
Comment 3 Pedro Ferreira 2018-08-01 07:50:48 UTC
Despite my attempts, I am no longer able to trigger this.
I do not recall what bug I was looking at when I stumbled onto this, and thus can't retrace my steps.
Also, Gentoo has upgraded GCC since this was originally reported, so that might have had an effect on this as well.
I was confident I had saved the debugging symbols file somewhere in case it would be required, but can't find it.
*sigh* I suppose I am unable to provide you with additional information at present :(
Comment 4 Matt 2019-07-10 15:08:50 UTC
We are able to consistently reproduce this with Valgrind-3.15.0-608cb11914-20190413

(Different application, not kwrite)
Comment 5 Matt 2019-07-10 16:23:12 UTC
In the failure, the values are as such:

szB=424
CACHE_ENTRY_SIZE=8192
Comment 6 Matt 2019-07-26 20:46:08 UTC
We found that the assertion is no longer hit when we converted our application from compressed to uncompressed debug symbols.
Comment 7 Reimar Döffinger 2019-10-08 14:23:29 UTC
This seems to be a logic bug in the realloc_CEnt function that was never adjusted for compressed symbol support.
alloc_CEnt has this logic:
   if (fromC) {                                                                                                               // szB can be arbitrary
   } else {                                                                                                                   vg_assert(szB == CACHE_ENTRY_SIZE);
   }                                                                                                                    

However realloc_CEnt does not have such a fromC argument and unconditionally checks
vg_assert(szB >= CACHE_ENTRY_SIZE);
Shouldn't these simply be aligned in behaviour?
Unfortunately I can't share any examples, but I would greatly appreciate if someone could check my logic and consider a patch based on that.
I think it requires a rather large binary with lots of debug symbols, as the cache re-uses compressed entries last, and that is when this bug happens.
Comment 8 Reimar Döffinger 2019-10-08 14:43:35 UTC
I can confirm that something trivial like e.g. below fixes it:

--- a/coregrind/m_debuginfo/image.c
+++ b/coregrind/m_debuginfo/image.c
@@ -509,10 +509,10 @@ static UInt alloc_CEnt ( DiImage* img, SizeT szB, Bool fromC )
    return entNo;
 }

-static void realloc_CEnt ( DiImage* img, UInt entNo, SizeT szB )
+static void realloc_CEnt ( DiImage* img, UInt entNo, SizeT szB, Bool fromC )
 {
    vg_assert(img != NULL);
-   vg_assert(szB >= CACHE_ENTRY_SIZE);
+   vg_assert(fromC || szB >= CACHE_ENTRY_SIZE);
    vg_assert(is_sane_CEnt("realloc_CEnt-pre", img, entNo));
    img->ces[entNo] = ML_(dinfo_realloc)("di.realloc_CEnt.1",
                                         img->ces[entNo],
@@ -768,7 +768,7 @@ static UChar get_slowcase ( DiImage* img, DiOffT off )
    }
    vg_assert(i >= 0 && i < CACHE_N_ENTRIES);

-   realloc_CEnt(img, i, size);
+   realloc_CEnt(img, i, size, /*fromC?*/cslc != NULL);
    img->ces[i]->size = size;
    img->ces[i]->used = 0;
    if (cslc == NULL) {
Comment 9 Julian Seward 2020-01-22 09:34:40 UTC
Committed, 3542be5bdc706b1a7d5d080ea01e81d4791e20b4.  Thank you
for the patch and the analysis.