I tried 2.4.1.rc1 on my big multithreaded app: [asalwa@valgrind linux_x86]$ /opt/valgrind-2.4.0.rc1/bin/valgrind --suppressions=/home/valgrind/mhp4linux/mhp.supp --error-limit=no --leak-check=full --leak-resolution=high --freelist-vol=80000000 ./mhp10probe_internal_linux_x86 -stream ~/strumienie/gen_20040311_17x.tsop -no_shifted_pointers 1 ==31371== Memcheck, a memory error detector for x86-linux. ==31371== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al. ==31371== Using valgrind-2.4.0.rc1, a program supervision framework for x86-linux. ==31371== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==31371== For more details, rerun with: -v ==31371== Report(): REPORT_INFO in file ../../impl/emu_linux/comm_sock.h, line 101, function CommSocket, OS function connect, message "connected to stbproxy, ready" ==31371== Conditional jump or move depends on uninitialised value(s) ==31371== at 0x81375CA: Win_MainMHPInitialization (MultiVideoStackImpl.h:596) ==31371== by 0x8129ADA: main (win_winmain.cpp:88) valgrind: vg_mylibc.c:1060 (string_match_wrk): Assertion `recDepth >= 0 && recDepth < 500' failed. ==31371== at 0xB003727B: vgPlain_skin_assert_fail (vg_mylibc.c:1168) ==31371== by 0xB003727A: assert_fail (vg_mylibc.c:1164) ==31371== by 0xB00372D5: vgPlain_core_assert_fail (vg_mylibc.c:1175) ==31371== by 0xB003709C: string_match_wrk (vg_mylibc.c:1060) ==31371== by 0xB0037103: string_match_wrk (vg_mylibc.c:1066) ==31371== by 0xB0037170: vgPlain_string_match (vg_mylibc.c:1097) ==31371== by 0xB001C56B: supp_matches_callers (vg_errcontext.c:991) ==31371== by 0xB001C5C4: is_suppressible_error (vg_errcontext.c:1009) ==31371== by 0xB001B9CC: vgPlain_maybe_record_error (vg_errcontext.c:552) ==31371== by 0xB7BBB153: vgMAC_record_param_error (mac_needs.c:481) ==31371== by 0xB7BBD7DF: mc_check_is_readable (mc_main.c:731) ==31371== by 0xB006E060: vgSkinInternal_pre_mem_read (vg_toolint.c:373) recDepth is global static variable. Are you sure it is correctly synchronized ?
It is easy to repeat in my environment. It happens every time, after a few seconds. If I remove mentioned assertion, then Valgrind goes further and my app works well.
The most obvious reason for this assertion would be that it is trying to match a wildcard in a suppression rule against a symbol more than 500 or so characters long. The assertion is just a safety check against run away recursion and you should be able to increase 500 to something larger. I suspect the reason you are now triggering this (the code in question hasn't changed) is that we now default to longer backtraces so it is probably seeing a symbol that it didn't before.
In such case 500 occurences of string_match_wrk should be visible on the callstack. But there are only 2. I investigated it in more detail. str passed to string_match_wrk is a binary garbage. It seems that caller_name prepared in supp_matches_callers is broken for some reason. I added following lines in string_match_wrk before assertion: VG_(printf)("running_tid==%d, recDepth==%d\n", running_tid, recDepth); if(recDepth==0) { VG_(printf)("pat==\"%s\", str==\"%s\"\n", pat, str); } (I had to make running_tid extern to make it compilable, of course). I will attach log with these additional messages in a few seconds.
Created attachment 10038 [details] log from Valgrind with additional messages
It shouldn't really matter what str is - it only recuses whan a * is seen in the pattern so unless you have 500 of those it shouldn't hit the limit. The bug is that the first case in the switch doesn't decrement recDepth so if you have one or more * characters in the pattern and a long (posibly bogus) string then it will eventually fail. The bogus string is really a separate issue and is exposing what is a long standing bug in the pattern matcher.
Created attachment 10039 [details] Patch to track recursion properly in the pattern matcher This patch should fix the pattern matcher to track recursion properly. It doesn't address the question of where the bogus symbol is coming from.
This patch works for me. Thanks.
The problem with the bad name is probably due to supp_matches_callers() not checking whether it managed to get the object or function name before trying to match it so it will sometimes try and match garbage.
Created attachment 10041 [details] Patch to not try and match bogus names This patch stops the error handling code trying to match object and function names against a suppression if it wasn't able to obtain the relevant name.
Can you try that new patch without the previous match to see if that also work? Thanks.
Second patch alone works for me too.
Would it be possible to come up with a regtest case for this?
I could probably come up with one for the symbol matcher bug but it's going to be quite hard to come up with one for the other bug - you might be able to deliberately create a program with no symbols but by definition what follows is based on reading uninitialised memory so there is no way to control how the bug might manifest itself.
Created attachment 10052 [details] Test case for string matcher assertion This is a test case for the string matcher assertion. I will post the suppression file to go with it in a minute. To test run as follows: valgrind --tool=memcheck --suppressions=longsym.supp ./longsym
Created attachment 10053 [details] Source for test case Actually that other attachment was the suppressions file, this is the source code.
Fix checked in. I'll push out -rc2 soon (this evening, especially if Nick can fix Massif).
*** Bug has been marked as fixed ***.