SUMMARY *** Valgrind seems to throw an error when checking my code. Error is as follows: *** STEPS TO REPRODUCE 1. Download and build https://github.com/sbromberger/finddups-cpp 2. run `valgrind --leak-check=yes ./finddups` OBSERVED RESULT https://dpaste.org/OTor0: ``` seth@hydrogen:~/dev/cpp/finddups master $ valgrind --leak-check=yes ./finddups . ==6333== Memcheck, a memory error detector ==6333== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==6333== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==6333== Command: ./finddups . ==6333== min size: 0, max size: 9223372036854775807, includes: [], excludes: [] sizemap size = 181 finished sizemap valgrind: m_debuginfo/image.c:586 (set_CEnt): Assertion '!sr_isError(sr)' failed. host stacktrace: ==6333== at 0x58065812: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x58065867: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x5806887B: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x580C4DA5: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x5813A08A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x5813E232: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x58144CAC: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x581B4456: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==6333== by 0x581B5CB8: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) sched status: running_tid=1 Thread 1: status = VgTs_Runnable syscall 9 (lwpid 6333) ==6333== at 0x4D0F0A2: __mmap64 (mmap64.c:58) ==6333== by 0x4D0F0A2: mmap (mmap64.c:46) ==6333== by 0x11292A: hashmap(std::unordered_map<unsigned long, std::vector<std::filesystem::__cxx11::directory_entry, std ::allocator<std::filesystem::__cxx11::directory_entry> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::alloca tor<std::pair<unsigned long const, std::vector<std::filesystem::__cxx11::directory_entry, std::allocator<std::filesystem::__c xx11::directory_entry> > > > > const&, Config const&) (in /home/seth/dev/cpp/finddups/finddups) ==6333== by 0x110073: main (in /home/seth/dev/cpp/finddups/finddups) client stack range: [0x1FFEFFE000 0x1FFF000FFF] client SP: 0x1FFEFFFA58 valgrind stack range: [0x1002CB2000 0x1002DB1FFF] top usage: 13144 of 1048576 <Note and other boilerplate text elided> ``` EXPECTED RESULT Something else (unknown as this is my first time running valgrind). SOFTWARE/OS VERSIONS Linux: Void Linux `Void 6.6.11_1 x86_64 AuthenticAMD uptodate rFFF` ADDITIONAL INFORMATION valgrind on system utilities (`df` and `ls` tested) seems to work fine.
First problem, your code didn't compile straight away. Here is a diff. diff --git a/finddups.cpp b/finddups.cpp index ec44c03..f22dba5 100644 --- a/finddups.cpp +++ b/finddups.cpp @@ -11,6 +11,7 @@ #include <iostream> #include <sys/mman.h> #include <unordered_map> +#include <unistd.h> constexpr unsigned long PAGESZ = 4096UL; @@ -194,7 +195,7 @@ int main(int argc, char *argv[]) { } std::cerr << "Total time: " << std::chrono::duration_cast<std::chrono::milliseconds>( - std::chrono::high_resolution_clock::now() - start) + std::chrono::high_resolution_clock::now() - start).count() << "\n"; return 0; } There are also a few warnings /home/paulf/scratch/finddups-cpp/finddups.cpp:49:31: warning: missing field 'includes' initializer [-Wmissing-field-initializers] return Config{min_sz, max_sz}; ^ /home/paulf/scratch/finddups-cpp/finddups.cpp:93:69: warning: unused parameter 'cfg' [-Wunused-parameter] entry_hash_map hashmap(const entry_size_map &sizemap, const Config &cfg) { ^ /home/paulf/scratch/finddups-cpp/finddups.cpp:16:25: warning: unused variable 'PAGESZ' [-Wunused-const-variable] constexpr unsigned long PAGESZ = 4096UL; ^ Second problem, from your report host stacktrace: ==6333== at 0x58065812: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) It looks like you are using a crappy distro that has a package maintainer that believes that they know better than the Valgrind developers and has ignored our advice not to strip the Valgrind binaries. Your distro may have a "valgrind-dbg" or "valgrind-dbgsym" package or something like that which would contain the split debuginfo files. Reproducing the error may depend where you are running the exe. When I run it on the project build directory I get no problem. When I run it as non-root on "/usr" I get an abort. I had to make several changes to use the overloads that use std::error_code since you don't catch exceptions in your directory iterating code. However, in your case "sizemap size" is only 181 (for my /usr it is 24699, my build directory is 176). So I assume that you're running it in your project build directory. On my system, no problem (FreeBSD 13.2, clang 14.0.5). Now looking at the source code. There are 5 calls to set_Cent. I can't tell which one it is in your case because of your stripped binary. Can you try again, either with the Valgrind debuginfo package (if your distro has one and it works) or with Valgrind built from source?
Thanks for the tips. On my system it compiles fine (using the CMakeFile.txt with clang, also with `clang++ -o finddups -g --std=c++20 *.cpp` but I have the dependencies built already). I don't think you need to call a `count()` method on the difference: https://en.cppreference.com/w/cpp/chrono/high_resolution_clock/now. I also don't seem to need `unistd.h` on my system. The warnings are innocuous, as they represent functionality that exists in other implementations (PAGESZ: I'm benchmarking the file accesses) and future options (ability to specify include/excludes). As far as the version of valgrind, this is simply what void packages: https://github.com/void-linux/void-packages/tree/master/srcpkgs/valgrind) - I didn't do any customization. I'll look around for one with debug symbols. Failing that, I'll try to build from source.
I found the package with debug symbols. Here's the new output. Does this help? ``` valgrind: m_debuginfo/image.c:586 (set_CEnt): Assertion '!sr_isError(sr)' failed. host stacktrace: ==21768== at 0x58065812: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x58065867: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x5806887B: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x580C4DA5: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x5813A08A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x5813E232: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x58144CAC: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x581B4456: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==21768== by 0x581B5CB8: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) sched status: running_tid=1 Thread 1: status = VgTs_Runnable syscall 9 (lwpid 21768) ==21768== at 0x4D5C0A2: __mmap64 (mmap64.c:58) ==21768== by 0x4D5C0A2: mmap (mmap64.c:46) ==21768== by 0x112127: hashmap(std::unordered_map<unsigned long, std::vector<std::filesystem::__cxx11::directory_entry, std::allocator<std::filesystem::__cxx11::directory_entry> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::vector<std::filesystem::__cxx11::directory_entry, std::allocator<std::filesystem::__cxx11::directory_entry> > > > > const&, Config const&) (finddups.cpp:111) ==21768== by 0x112FF0: main (finddups.cpp:183) client stack range: [0x1FFEFFE000 0x1FFF000FFF] client SP: 0x1FFEFFF818 valgrind stack range: [0x1002DB2000 0x1002EB1FFF] top usage: 17080 of 1048576 ```
No, it's a proper stacktrace with file names and line numbers that I need. Can you build from source and retry?
(In reply to Paul Floyd from comment #4) > No, it's a proper stacktrace with file names and line numbers that I need. > > Can you build from source and retry? The one built from source does not error. It ends with ``` =19375==End of process memory map. ==19375== ==19375== HEAP SUMMARY: ==19375== in use at exit: 0 bytes in 0 blocks ==19375== total heap usage: 88 allocs, 88 frees, 2,951 bytes allocated ==19375== ==19375== All heap blocks were freed -- no leaks are possible ==19375== ==19375== For lists of detected and suppressed errors, rerun with: -s ==19375== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ```
I found the problem: the finddups code was incorrectly checking `open` for error using `0` - it needed to be `-1`. When this changed, valgrind (both the package version and the one built from source) started working.
I spoke too soon - it started crashing again. (I don't think it's my code). Here's the full output from a custom build of valgrind with the symbols: ``` valgrind: m_debuginfo/image.c:586 (set_CEnt): Assertion '!sr_isError(sr)' failed. host stacktrace: ==27068== at 0x5804383A: show_sched_status_wrk (m_libcassert.c:407) ==27068== by 0x58043957: report_and_quit (m_libcassert.c:478) ==27068== by 0x58043AE7: vgPlain_assert_fail (m_libcassert.c:544) ==27068== by 0x580C4B9F: set_CEnt (image.c:586) ==27068== by 0x580C4DD1: vgModuleLocal_img_from_fd (image.c:913) ==27068== by 0x58083B9F: vgModuleLocal_check_elf_and_get_rw_loads (readelf.c:3820) ==27068== by 0x58070609: vgPlain_di_notify_mmap (debuginfo.c:1393) ==27068== by 0x580A0031: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2466) ==27068== by 0x580ABA6F: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:413) ==27068== by 0x5809C6A9: vgPlain_client_syscall (syswrap-main.c:2240) ==27068== by 0x580981EA: handle_syscall (scheduler.c:1206) ==27068== by 0x5809A1C6: vgPlain_scheduler (scheduler.c:1552) ==27068== by 0x580E5279: thread_wrapper (syswrap-linux.c:102) ==27068== by 0x580E5279: run_a_thread_NORETURN (syswrap-linux.c:155) sched status: running_tid=1 Thread 1: status = VgTs_Runnable syscall 9 (lwpid 27068) ==27068== at 0x4D5C0A2: __mmap64 (mmap64.c:58) ==27068== by 0x4D5C0A2: mmap (mmap64.c:46) ==27068== by 0x112017: hashmap(std::unordered_map<unsigned long, std::vector<std::filesystem::__cxx11::directory_entry, st d::allocator<std::filesystem::__cxx11::directory_entry> >, std::hash<unsigned long>, std::equal_to<unsigned long>, std::alloc ator<std::pair<unsigned long const, std::vector<std::filesystem::__cxx11::directory_entry, std::allocator<std::filesystem::__ cxx11::directory_entry> > > > > const&, Config const&) (finddups.cpp:111) ==27068== by 0x112F05: main (finddups.cpp:183) client stack range: [0x1FFEFFE000 0x1FFF000FFF] client SP: 0x1FFEFFF6C8 valgrind stack range: [0x1002EF3000 0x1002FF2FFF] top usage: 12616 of 1048576 ```
I need to check the generic mmap wrapper to see if it handles the case of an mmap with a failed open fd.
(In reply to Paul Floyd from comment #8) > I need to check the generic mmap wrapper to see if it handles the case of an > mmap with a failed open fd. I don't really know how any of this works, but in my tests I didn't have any mmap / open failures. Does valgrind simulate them?
Loading of shared libraries is done with mmap. So whenever Valgrind sees an mmap it reads the file to see if is an ELF binary and to determine if it needs to read the DWARF debuginfo. The assert that you are seeing happens during this check (vgModuleLocal_check_elf_and_get_rw_loads). It looks like the mmap syscall is succeeding (which would also mean that the open succeeded). The failure is here SysRes sr = VG_(pread)(img->source.fd, &ce->data[0], (Int)len, off); vg_assert(!sr_isError(sr)) I'll see if I can reproduce the problem on Linux.
OK I've found the problem. You're opening the files with O_DIRECT. But then Valgrind is just using a pread based on the size to see if it is an ELF load. That's causing a EINVAL because of this EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the file offset is not suitably aligned. As a workaround, you can just remove the O_DIRECT. In Valgrind, handling the memory restrictions looks complicated. Instead I'll look at 1) getting the fd flags with fcntl, 2) checking if O_DIRECT is set 3) turning it off if it is, 4) doing the pread and 5) restoring the flag if needed As a reminder to myself, the code in question is if (img->source.is_local) { // Simple: just read it SysRes sr = VG_(pread)(img->source.fd, &ce->data[0], (Int)len, off); vg_assert(!sr_isError(sr)); in image.c
Removing O_DIRECT does indeed appear to fix the problem. Since there's no significant performance impact to removing it, I'll do that for now. Thank you for helping me track this down. Also, thanks for the suggestion re: debug symbols. I've opened up an issue with my distribution: https://github.com/void-linux/void-packages/issues/48427
commit 2c8ecd648204a8be01714fa6878373f57e073d25 (HEAD -> master, origin/users/paulf/try-bug480405, origin/master, origin/HEAD, bug480405) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Tue Jan 30 22:02:21 2024 +0100 Bug 480405 - valgrind 3.22.0 "m_debuginfo/image.c:586 (set_CEnt): Assertion '!sr_isError(sr)' failed."
This is a silly question: but it sort of sounds like Valgrind looks at any mmap-opened file to see if it's a shared library (that is used by the binary?). What if the file I'm opening happens to be a shared library that is NOT used by the binary?
(In reply to bugskde from comment #14) > This is a silly question: but it sort of sounds like Valgrind looks at any > mmap-opened file to see if it's a shared library (that is used by the > binary?). What if the file I'm opening happens to be a shared library that > is NOT used by the binary? Loading an executable binary doesn't just mmap the entire file. ELF (and macho on macOS) files are split into segments and sections. Segments like "NOTE" (info related to the OS and architecture) don't need to be mapped. It's only the LOAD segments that matter. And even then we aren't interested in the read-only segment as it should never be the source of errors. So there is complex code to determine when the read-write (global data) and read-execute (the program text) has been loaded. Once it's been determined that all RW and RX LOAD segments have been mmaped then we trigger reading any DWARF debuginfo. Platforms vary a lot in how they do this, and linkers keep coming up with tweaks to try to improve the memory layout and/or make the binary files smaller.