Created attachment 143640 [details] swi-pl crash log when running under valgrind SUMMARY Running swi prolog under FreeBSD 12.2 results in unhandled syscall 247 warning followed by a swi-pl crash. Looking at /usr/src/sys/sys/syscall.h it looks like syscall 247 is SYS_clock_getcpuclockid2. STEPS TO REPRODUCE 1. run `valgring swipl` on FreeBSD OBSERVED RESULT A lot of warnings "unhandled amd64-freebsd syscall: 247" printed, followed by swi-pl segfault (attached log file). EXPECTED RESULT No warnings, swi-pl should work normally under valgrind. SOFTWARE/OS VERSIONS FreeBSD DaemONX 12.2-RELEASE-p7 FreeBSD 12.2-RELEASE-p7 GENERIC amd64 valgrind-3.18.1 SWI-Prolog version 8.2.3 for amd64-freebsd ADDITIONAL INFORMATION
Looking at syscalls.master the missing syscall wrapper seems fairly straightforward 247 AUE_NULL STD { int clock_getcpuclockid2( id_t id, int which, _Out_ clockid_t *clock_id ); } The crash may be unrelated.
I've done a super quick implementation of clock_getcpuclockid2 [it probably needs a separate i386 version because the 1st param is 64bits]. I now get paulf> ./vg-in-place --soname-synonyms=somalloc=libtcmalloc_minimal.so.4 swipl ==4514== Memcheck, a memory error detector ==4514== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==4514== Using Valgrind-3.19.0.GIT and LibVEX; rerun with -h for copyright info ==4514== Command: swipl ==4514== #Welcome to SWI-Prolog (threaded, 64 bits, version 8.2.3) SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software. Please run ?- license. for legal details. For online help and background, visit https://www.swi-prolog.org For built-in help, use ?- help(Topic). or ?- apropos(Word). ==4514== brk segment overflow in thread #2: can't grow to 0x4820000 ==4514== (see section Limitations in user manual) ==4514== NOTE: further instances of this message will not be shown ==4514== Thread 2 gc: ==4514== Invalid write of size 8 ==4514== at 0x4877C05: tcmalloc::ThreadCache::Init(pthread*) (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4878C4D: tcmalloc::ThreadCache::NewHeap(pthread*) (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4878A74: tcmalloc::ThreadCache::CreateCacheIfNecessary() (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x486E115: TCMallocImplementation::MarkThreadBusy() (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4AA26F4: ??? (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4AB8EC2: PL_next_solution (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4AD08DB: PL_call_predicate (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4B5C7BC: ??? (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x529082A: ??? (in /lib/libthr.so.3) ==4514== Address 0x4021000 is not stack'd, malloc'd or (recently) free'd ==4514== ==4514== ==4514== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==4514== Access not within mapped region at address 0x4021000 ==4514== at 0x4877C05: tcmalloc::ThreadCache::Init(pthread*) (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4878C4D: tcmalloc::ThreadCache::NewHeap(pthread*) (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4878A74: tcmalloc::ThreadCache::CreateCacheIfNecessary() (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x486E115: TCMallocImplementation::MarkThreadBusy() (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==4514== by 0x4AA26F4: ??? (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4AB8EC2: PL_next_solution (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4AD08DB: PL_call_predicate (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x4B5C7BC: ??? (in /usr/local/lib/swipl/lib/amd64-freebsd/libswipl.so.8.2.3) ==4514== by 0x529082A: ??? (in /lib/libthr.so.3) ==4514== If you believe this happened as a result of a stack ==4514== overflow in your program's main thread (unlikely but ==4514== possible), you can try to increase the size of the ==4514== main thread stack using the --main-stacksize= flag. ==4514== The main thread stack size used in this run was 16777216.
If I increase the brk segment from 8M to 1G that part of the message goes away, but the other parts remain. The diff to do that is diff --git a/coregrind/m_initimg/initimg-freebsd.c b/coregrind/m_initimg/initimg-freebsd.c index d19186a42..59c2f4f85 100644 --- a/coregrind/m_initimg/initimg-freebsd.c +++ b/coregrind/m_initimg/initimg-freebsd.c @@ -891,7 +891,7 @@ IIFinaliseImageInfo VG_(ii_create_image)( IICreateImageInfo iicii, //-------------------------------------------------------------- { SizeT m1 = 1024 * 1024; - SizeT m8 = 8 * m1; + SizeT m8 = 8 * m1 * 128; SizeT dseg_max_size = (SizeT)VG_(client_rlimit_data).rlim_cur; VG_(debugLog)(1, "initimg", "Setup client data (brk) segment\n"); if (dseg_max_size < m1) dseg_max_size = m1; Note that questions about this arise frequently. This has to be hard coded as it is really early in the tool startup, and it is not yet possible to parse command line arguments or read environment variables.
I've pushed the change to resolve the missing syscall wrapper. Can you clone and build Valgrind (with the diff in my previous message) and also build debug versions of swi-pl and jemallloc? I'll also try to get in touch with the swi-pl maintainer. I'll leave this item open for the moment.
OK, but how can I build debug version of jemallloc and then use it? As I understand it, it is FreeBSD's malloc? Do I run make in /usr/src/lib/libc/stdlib/jemalloc?
My mistake /usr/local/lib/libtcmalloc_minimal.so.4.5.9 is google perftools tcmalloc https://www.freshports.org/devel/google-perftools/ Try getting a debug build of swi-pl first. If there is a problem it is more likely to be there than in the memory allocator.
Created attachment 143711 [details] Russing swi-pl under valgrind (with debug symbols)
I did a build in a jail to not mess up my system, hope this isn't an issue. I built valgrind at f13667b1eff8d3d06590683b9981ced611bd3c69 + brk change and debug build of swi-pl 8.2.3. I've attached log from running swi-pl under valgrind.
Does --track-origins=yes show anything useful? What is happening here pl_thread_idle2_va (src/pl-alloc.c:1899) ? ==59737== Thread 2 gc: ==59737== Conditional jump or move depends on uninitialised value(s) ==59737== at 0x4077122: TCMallocImplementation::MarkThreadBusy() (in /usr/local/lib/libtcmalloc_minimal.so.4.5.9) ==59737== by 0x42B68D2: pl_thread_idle2_va (src/pl-alloc.c:1899) ==59737== by 0x42C86D9: PL_next_solution (src/pl-vmi.c:3839) ==59737== by 0x42E0AD1: PL_call_predicate (src/pl-fli.c:4145) ==59737== by 0x439B2C7: GCmain (src/pl-thread.c:5527) ==59737== by 0x4AAAFAB: ??? (in /basejail/lib/libthr.so.3) From what I can see online the swi-pl code is doing this In initTCMalloc fMallocExtension_MarkThreadBusy = PL_dlsym(NULL, "MallocExtension_MarkThreadBusy"); In PRED_IMPL("thread_idle", 2, thread_idle, PL_FA_TRANSPARENT) if ( fMallocExtension_MarkThreadBusy ) fMallocExtension_MarkThreadBusy(); In tcmalloc class TCMallocImplementation : public MallocExtension { virtual void MarkThreadBusy(); // Implemented below ... void TCMallocImplementation::MarkThreadBusy() { // Allocate to force the creation of a thread cache, but avoid // invoking any hooks. do_free(do_malloc(0)); } Calling a C++ virtual function through a C pointer to function isn't safe. In this case the virtual function doesn't seem to use 'this'. do_free and do_malloc are inlined so it's hard to see exactly what is going on in MarkThreadBusy But I wonder if tcmalloc needs some global initialization before these calls. Not sure how to debug that.
I'm closing this as the unhandled syscall has been added. I'll continue to investigate the swi-pl problem when running under Valgrind and will open a new item if necessary.
Created attachment 143736 [details] Running swipl under valgrind with track-origins enabled
I attached log with added --track-origins=yes
Created attachment 143745 [details] Running swipl under valgrind with debug tcmalloc
I attached log with debug tcmalloc version. Is using sbrk normal? Isn't this a legacy thing?
Please could you move discusssion here: https://github.com/paulfloyd/freebsd_valgrind/issues/174