Summary: | drd false positives on newly allocated memory | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Bradley C. Kuszmaul <kuszmaul> |
Component: | drd | Assignee: | Bart Van Assche <bart.vanassche+kde> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | 3.7.0 | ||
Target Milestone: | --- | ||
Platform: | Fedora RPMs | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Bradley C. Kuszmaul
2012-03-30 21:18:05 UTC
Which command line options were passed to drd ? Probably at least --check-stack-var=yes ? No arguments. Just valgrind --tool=drd ./test If I add --check-stack-var=yes the reported errors seem to go away. I guess I don't understand drd very well: how can checking more things result in fewer reported errors? Good question. With --check-stack-var=yes DRD does not only enable data race detection on stack variables but also cleans stack memory references every time the stack pointer is increased (stacks grow downward on x86). So if any stack object has been shared over threads without having invoked ANNOTATE_NEW_MEMORY() after the last use of that object with --check-stack-var=no that will result in false positives being reported on new stack variables that use the same stack memory addresses later on. Perhaps the manual should be updated with that explanation, at which point I would consider this issue resolved. It looks like --check-stack-var=yes is making my program run much slower than without. It ran all night without finishing the first second of computation. One strange thing about this problem is that the memory that's being complained about is malloc()'d. Could it be the case that there was pthread stack that was freed (because the pthread finished) and then the same memory got used for a malloc()'d object? Or maybe the other way around? Is there a way to not complain about stack variables that are shared. (In reply to comment #5) > Could it be the case that there was pthread stack that was freed (because the pthread finished) > and then the same memory got used for a malloc()'d object? pthread stacks are already cleaned as soon as a thread finishes. This is done by the drd_stop_using_mem() call in drd_thread_finished(). (In reply to comment #5) > Is there a way to not complain about stack variables that are shared. If there are not too much stack variables that are shared, one way is to insert ANNOTATE_NEW_MEMORY() annotations in the scope where the stack variable is defined after the last use of such a variable. If you are using C++, the end of the destructor is a good place to insert such annotations. (In reply to comment #4) > Perhaps the manual should be updated with that explanation, at which point I > would consider this issue resolved. I've taken another approach in r12474 - the changes in that revision cause drd not to complain about stack variables that are shared over threads. ... at least with --check-stack-var=no. I ran r12475 with --check-stack-var=no. My original problem is still there. I don't think that running with --check-stack-var=yes actually solved the problem. Instead it just slowed down the test so much that the test timed out before the warning was emitted. So, I've still got some problem where I malloc() some memory and then initialize that memory (and at that point no other thread knows about that memory). Somehow the memory returned by malloc() is sometimes retaining some conflict information from it's previous life. Even if I put an ANNOTATE_NEW_MEMORY() call after every malloc then the problem goes away. It seems like somehow, malloc isn't treating the new memory as new in some cases. Which malloc() implementation are you using ? glibc, tcmalloc or even another implementation ? Using the --trace-alloc=yes option might help to verify that DRD actually "sees" the malloc() calls invoked from the application. I'm using glibc. When I turn on --trace-alloc I get a) The memory was recently released b) The memory was then reallocated using malloc c) the initialization complains It looks to me like there may be a race in drd_malloc_wrappers.c. When I look at r12467 at drd_malloc_wrappers.c:135 (in freelike_block()), I see if (dealloc) VG_(cli_free)((void*)p); if (mc->size > 0) s_stop_using_mem_callback(mc->data, mc->size); VG_(HT_remove)(s_malloc_list, (UWord)p); VG_(free)(mc); So it looks like first the pointer is freed, then drd records that the memory is no longer in use. I read that as a race condition, because another thread can then malloc the memory between the cli_free() and the s_stop_using_mem_callback(). (In reply to comment #12) > So it looks like first the pointer is freed, then drd records that the > memory is no longer in use. > > I read that as a race condition, because another thread can then malloc the > memory between the cli_free() and the s_stop_using_mem_callback(). Inside Valgrind malloc() and free() intercepts are invoked on behalf of client code and hence with the "big lock" held. The only client code invoked without the big lock held is a blocking system call. mmap() is considered by Valgrind as a non-blocking system call. So although I have swapped these two statements on the trunk, I do not expect that this change will cause a behavior change in drd. Yes, r12478 does not fix the problem. Also the corresponding change to realloc produces no change. But adding an explicit ANNOTATE_NEW_MEMORY(p, size) in my code after the malloc() does change the behavior. It's acting "racy" in that it's slightly different on every run, and it's infrequent (a few times in a hundred thousand mallocs and frees). (In reply to comment #14) > But adding an explicit ANNOTATE_NEW_MEMORY(p, size) in my code after the > malloc() does change the behavior. > > It's acting "racy" in that it's slightly different on every run, and it's > infrequent (a few times in a hundred thousand mallocs and frees). It would help if you could make a small test program available that triggers this behavior. That would also exclude that the reported race is caused by a use-after-free in the client program. Closing this issue because it's not clear whether it's caused by the client program or by the tool. Please reopen once more information can be provided. Should be fixed by r12629. |