Bug 101173 - Assertion `recDepth >= 0 && recDepth < 500' failed
Summary: Assertion `recDepth >= 0 && recDepth < 500' failed
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-09 13:56 UTC by Aleksander Salwa
Modified: 2005-03-10 03:05 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
log from Valgrind with additional messages (217.91 KB, text/plain)
2005-03-09 14:51 UTC, Aleksander Salwa
Details
Patch to track recursion properly in the pattern matcher (839 bytes, patch)
2005-03-09 15:16 UTC, Tom Hughes
Details
Patch to not try and match bogus names (995 bytes, patch)
2005-03-09 16:09 UTC, Tom Hughes
Details
Test case for string matcher assertion (49 bytes, text/plain)
2005-03-10 01:24 UTC, Tom Hughes
Details
Source for test case (1.22 KB, text/plain)
2005-03-10 01:25 UTC, Tom Hughes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksander Salwa 2005-03-09 13:56:14 UTC
I tried 2.4.1.rc1 on my big multithreaded app:

[asalwa@valgrind linux_x86]$ /opt/valgrind-2.4.0.rc1/bin/valgrind
--suppressions=/home/valgrind/mhp4linux/mhp.supp --error-limit=no
--leak-check=full --leak-resolution=high --freelist-vol=80000000
./mhp10probe_internal_linux_x86 -stream ~/strumienie/gen_20040311_17x.tsop
-no_shifted_pointers 1
==31371== Memcheck, a memory error detector for x86-linux.
==31371== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==31371== Using valgrind-2.4.0.rc1, a program supervision framework for x86-linux.
==31371== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==31371== For more details, rerun with: -v
==31371==
Report(): REPORT_INFO in file ../../impl/emu_linux/comm_sock.h, line 101,
function CommSocket, OS function connect, message "connected to stbproxy, ready"
==31371== Conditional jump or move depends on uninitialised value(s)
==31371==    at 0x81375CA: Win_MainMHPInitialization (MultiVideoStackImpl.h:596)
==31371==    by 0x8129ADA: main (win_winmain.cpp:88)

valgrind: vg_mylibc.c:1060 (string_match_wrk): Assertion `recDepth >= 0 &&
recDepth < 500' failed.
==31371==    at 0xB003727B: vgPlain_skin_assert_fail (vg_mylibc.c:1168)
==31371==    by 0xB003727A: assert_fail (vg_mylibc.c:1164)
==31371==    by 0xB00372D5: vgPlain_core_assert_fail (vg_mylibc.c:1175)
==31371==    by 0xB003709C: string_match_wrk (vg_mylibc.c:1060)
==31371==    by 0xB0037103: string_match_wrk (vg_mylibc.c:1066)
==31371==    by 0xB0037170: vgPlain_string_match (vg_mylibc.c:1097)
==31371==    by 0xB001C56B: supp_matches_callers (vg_errcontext.c:991)
==31371==    by 0xB001C5C4: is_suppressible_error (vg_errcontext.c:1009)
==31371==    by 0xB001B9CC: vgPlain_maybe_record_error (vg_errcontext.c:552)
==31371==    by 0xB7BBB153: vgMAC_record_param_error (mac_needs.c:481)
==31371==    by 0xB7BBD7DF: mc_check_is_readable (mc_main.c:731)
==31371==    by 0xB006E060: vgSkinInternal_pre_mem_read (vg_toolint.c:373)

recDepth is global static variable. Are you sure it is correctly synchronized ?
Comment 1 Aleksander Salwa 2005-03-09 13:59:45 UTC
It is easy to repeat in my environment. It happens every time, after a few seconds.
If I remove mentioned assertion, then Valgrind goes further and my app works well.
Comment 2 Tom Hughes 2005-03-09 14:12:14 UTC
The most obvious reason for this assertion would be that it is trying to match a wildcard in a suppression rule against a symbol more than 500 or so characters long. The assertion is just a safety check against run away recursion and you should be able to increase 500 to something larger.

I suspect the reason you are now triggering this (the code in question hasn't changed) is that we now default to longer backtraces so it is probably seeing a symbol that it didn't before.
Comment 3 Aleksander Salwa 2005-03-09 14:49:42 UTC
In such case 500 occurences of string_match_wrk should be visible on the callstack. But there are only 2.
I investigated it in more detail. str passed to string_match_wrk is a binary garbage. It seems that caller_name prepared in supp_matches_callers is broken for some reason.
I added following lines in string_match_wrk before assertion:

   VG_(printf)("running_tid==%d, recDepth==%d\n", running_tid, recDepth);
   if(recDepth==0) {
       VG_(printf)("pat==\"%s\", str==\"%s\"\n", pat, str);
   }

(I had to make running_tid extern to make it compilable, of course).

I will attach log with these additional messages in a few seconds.
Comment 4 Aleksander Salwa 2005-03-09 14:51:49 UTC
Created attachment 10038 [details]
log from Valgrind with additional messages
Comment 5 Tom Hughes 2005-03-09 15:12:59 UTC
It shouldn't really matter what str is - it only recuses whan a * is seen in the pattern so unless you have 500 of those it shouldn't hit the limit. The bug is that the first case in the switch doesn't decrement recDepth so if you have one or more * characters in the pattern and a long (posibly bogus) string then it will eventually fail.

The bogus string is really a separate issue and is exposing what is a long standing bug in the pattern matcher.
Comment 6 Tom Hughes 2005-03-09 15:16:04 UTC
Created attachment 10039 [details]
Patch to track recursion properly in the pattern matcher

This patch should fix the pattern matcher to track recursion properly. It
doesn't address the question of where the bogus symbol is coming from.
Comment 7 Aleksander Salwa 2005-03-09 15:37:43 UTC
This patch works for me. Thanks.
Comment 8 Tom Hughes 2005-03-09 16:07:43 UTC
The problem with the bad name is probably due to supp_matches_callers() not checking whether it managed to get the object or function name before trying to match it so it will sometimes try and match garbage.
Comment 9 Tom Hughes 2005-03-09 16:09:52 UTC
Created attachment 10041 [details]
Patch to not try and match bogus names

This patch stops the error handling code trying to match object and function
names against a suppression if it wasn't able to obtain the relevant name.
Comment 10 Tom Hughes 2005-03-09 16:10:38 UTC
Can you try that new patch without the previous match to see if that also work? Thanks.
Comment 11 Aleksander Salwa 2005-03-09 17:35:45 UTC
Second patch alone works for me too.
Comment 12 Jeremy Fitzhardinge 2005-03-09 19:01:39 UTC
Would it be possible to come up with a regtest case for this?
Comment 13 Tom Hughes 2005-03-09 20:09:40 UTC
I could probably come up with one for the symbol matcher bug but it's going to be quite hard to come up with one for the other bug - you might be able to deliberately create a program with no symbols but by definition what follows is based on reading uninitialised memory so there is no way to control how the bug might manifest itself.
Comment 14 Tom Hughes 2005-03-10 01:24:53 UTC
Created attachment 10052 [details]
Test case for string matcher assertion

This is a test case for the string matcher assertion. I will post the
suppression file to go with it in a minute. To test run as follows:

valgrind --tool=memcheck --suppressions=longsym.supp ./longsym
Comment 15 Tom Hughes 2005-03-10 01:25:57 UTC
Created attachment 10053 [details]
Source for test case

Actually that other attachment was the suppressions file, this is the source
code.
Comment 16 Jeremy Fitzhardinge 2005-03-10 03:05:39 UTC
Fix checked in.  I'll push out -rc2 soon (this evening, especially if Nick can fix Massif).
Comment 17 Jeremy Fitzhardinge 2005-03-10 03:05:55 UTC
*** Bug has been marked as fixed ***.