Summary: | Helgrind: implementation of ANNOTATE_HAPPENS_BEFORE() / AFTER() is not correct | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Bart Van Assche <bart.vanassche+kde> |
Component: | helgrind | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dvyukov, eugeni.stepanov, konstantin.s.serebryany, timurrrr |
Priority: | NOR | ||
Version: | 3.6 SVN | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Bart Van Assche
2010-07-08 12:25:35 UTC
This happens because on DRD, we wind up making a clreq VG_USERREQ__DRD_ANNOTATE_HAPPENS_BEFORE(addr) but on Helgrind, _VG_USERREQ__HG_USERSO_SEND_PRE(addr) and _VG_USERREQ__HG_USERSO_SEND_PRE == VG_USERREQ_TOOL_BASE('H','G') + 256 + about 30 whereas VG_USERREQ__DRD_ANNOTATE_HAPPENS_BEFORE == VG_USERREQ_TOOL_BASE('H','G') + 256 + 33 Hmm. So actually it seems like it should work. Need to investigate more. Yeah, you're right. For _HAPPENS_AFTER, DRD generates h-b edges back to all points that have ever done _HAPPENS_BEFORE on the object. Helgrind generates only one h-b edge, back to the most recent thread to do _HAPPENS_BEFORE on the object. I can see DRD's behaviour is the right one for annotating thread safe reference counting etc. So I can modify Helgrind to match. But .. is there any situation in which we would want to say "forget about any previous _HAPPENS_BEFORE calls on the given object" ? It seems somehow unclean that we don't have a way to say that. Specifically, suppose that the memory address is eventually re-used as part of a new object. I don't want h-b edges from any older use of that memory to get mixed up with h-b edges from newer uses. (If you see what I mean). I'm sure we've had a discussion on this issue at some point in the past (possibly w/ Konstantin), but I can't remember where now. kcc, can you comment on comment #2 2nd para? This is how we annotate ref counting: http://code.google.com/p/data-race-test/wiki/DynamicAnnotations#Reference_counting We never found a need to 'forget' previous happens_before events. Do you have a *realistic* example where this is needed to avoid false negative? (In reply to comment #4) > This is how we annotate ref counting: > http://code.google.com/p/data-race-test/wiki/DynamicAnnotations#Reference_counting > > We never found a need to 'forget' previous happens_before events. > Do you have a *realistic* example where this is needed to avoid false negative? There is something missing in the lockless example, namely a memory barrier before and after AtomicDecrementByOne(). See e.g. this discussion: http://old.nabble.com/Tweaking-atomics-and-memory-barriers-in-shared_ptr-td13442631.html. Kostya, What about a custom allocator with free-list-like pool of RefCounted<> objects of the same size? The refcounter memory locations will be re-used over and over. I think we can flush the H-B signal map (signaller_map_) once a memory address gets deleted (e.g. HandleFree). Shouldn't be a problem, right? Bart, in real Google code our AtomicDecrementByOne is barrier'ed :) (In reply to comment #4) > We never found a need to 'forget' previous happens_before events. > Do you have a *realistic* example where this is needed to avoid false negative? No, I don't. But to turn the question around, do you have a formal or semi-formal argument (proof) that not 'forgetting' previous events is never going to lead to false negatives? Personally, I'm lazy :) and if it's difficult to construct such a proof, I'd prefer to implement a _FORGET_ macro, on the basis that it's easier than proving it is never necessary. Bart: of course, I assumed that AtomicDecrementByOne() implements all required barriers. You may check the way it is done in our code: http://src.chromium.org/viewvc/chrome/trunk/src/base/atomic_ref_count.h?view=markup Timur: if we have a free-list of ref counted objects (or malloc/free serves as such free list) then the dispose/reuse pair (i.e. free/malloc) will also create a hb-arc. So, having an extra hb-arc from the previous life of the ref counted object will not add anything. Julian: no, no proof. However, the most frequent case (as above, the object is reused after free and following malloc) is prone to false negatives. my bad English... s/prone to false negatives/protected from false negatives/ (In reply to comment #8) > But to turn the question around, do you have a formal > or semi-formal argument (proof) that not 'forgetting' previous events > is never going to lead to false negatives? Personally, I'm lazy :) > and if it's difficult to construct such a proof, I'd prefer to > implement a _FORGET_ macro, on the basis that it's easier than > proving it is never necessary. Existing mechanisms (freeing memory) should be sufficient. Additionally, if a user inserts such a _FORGET_ macro at the wrong place that can result in a false negative, isn't it ? (In reply to comment #4) > We never found a need to 'forget' previous happens_before events. A different reason to have a 'forget' macro is to help avoid resource leaks in the tools. It could happen that an application creates a large number of objects subject to thread-safe reference counting. Then HAPPENS_BEFORE / HAPPENS_AFTER will be called on a large number of different addresses. At least in Helgrind, each such call will result in a permanent reference to a vector timestamp being created (2, in fact). After a while these come to occupy a large amount of memory. (This really happened to me.) A 'forget' macro could therefore help, because the tool can release all resources (including VTSs) associated with the tracked address. Now, you could argue that such forgetting and resource releasing could instead be automatically done, when memory is deallocated. But: (1) that assumes the arguments to HAPPENS_BEFORE / HAPPENS_AFTER are memory addresses. But they could be completely arbitrary machine words; nothing says that they have to be interpreted as memory addresses. (2) it's expensive, to have to scan a mapping potentially containing hundreds of thousands of addresses, to see if any fall inside an address range that is just about to be freed. (In reply to comment #11) > Additionally, if a > user inserts such a _FORGET_ macro at the wrong place that can result in a > false negative, isn't it ? That is true. But you can also cause false negatives by putting HAPPENS_BEFORE / HAPPENS_AFTER in the wrong place. So I don't think that the fact that the macro can be incorrectly used is a good argument for not having it. I partially agree with Julian about the resource problem. For the hb objects that reside in heap we can implement fast deletion, but that won't be zero cost. Also agree that the hb object could be an arbitrary word, although I haven't seen cases where such words would be numerous. Yet another point: ThreadSanitizer flushes state when it runs out of resources -- so we never observed this problem in the wild (yes, you can write a simple stress test that will exercise it). However, I would not expect that this macro (if added) will be used by anyone who does not know the gory details of race detector. A side note: Dmitry Vyukov (who may want to comment here as well) mentioned that the current two annotations, HAPPENS_BEFORE/HAPPENS_AFTER are not good enough to annotate synchronization done by atomic ReleaseStore and AcquireLoad functions. Currently, if two threads do HAPPENS_BEFORE(x) and a the third thread does HAPPENS_AFTER(x), the detectors will draw 2 arcs: T1=>T3 and T2=>T3. This behavior is correct if the hb-relation is introduced by mutex operations or a CAS, but not correct if introduced by ReleaseStore/AcquireLoad. For these we need to add either T1=>T3 *or* T2=>T3 (whichever ReleaseStore happened last). Dmitry, did I get the idea right? I am still not convinced that this justifies adding one more annotation though :)
> For these we need to add either T1=>T3 *or* T2=>T3 (whichever ReleaseStore
> happened last).
You could _almost_ implement that semantics if a FORGET function was
available, like this
T1, T2: ReleaseStore(x); FORGET(x); HAPPENS_BEFORE(x);
T2: HAPPENS_AFTER(x); AcquireLoad(x);
By putting a FORGET immediately before the HAPPENS_BEFORE, we get
just one h-b arc, from whichever of T1 or T2 runs last.
I say "almost" because the FORGET and HAPPENS_BEFORE need to be
atomic, and obviously they aren't, in this example.
Duh, I meant T3: HAPPENS_AFTER(x); AcquireLoad(x); yea :) Also, HAPPENS_BEFORE should go *before* the store and HAPPENS_AFTER should go *after* the load. (In reply to comment #14) > A side note: Dmitry Vyukov (who may want to comment here as well) mentioned > that the current two annotations, HAPPENS_BEFORE/HAPPENS_AFTER are not good > enough to annotate synchronization done by atomic ReleaseStore and AcquireLoad > functions. IMHO load-acquire and store-release should be recognized by VEX and handled similarly to Ist_LLSC instead of being annotated with client requests. >> MHO load-acquire and store-release should be recognized by VEX In general you can't because e.g. on x86 load-acquire and store-release are just regular load and store once you get to assembly. The difference is in the compiler barriers which are present in the code but not in the binary. See line 127 in http://src.chromium.org/viewvc/chrome/trunk/src/base/atomicops_internals_x86_gcc.h?annotate=53716 Here's a revised proposal. I think it covers all scenarios discussed so far. Both macros get an extra Boolean argument: HAPPENS_BEFORE( unsigned long tag, bool forget_before ) When forget_before == false, behaviour is as present: a h-b arc starting at this point is added to the set associated with 'tag'. When forget_before == true, any previous h-b arcs associated with 'tag' are first removed, and then one from this point is added. This allows support of ReleaseStore/AcquireLoad discussed in comments 14 and 15 without atomicity problems. IOW, the bool allows the caller to distinguish the 'overwrite' vs 'add' scenarios. The inability to do this, + ambiguity in the spec, was the original cause of this bug report. HAPPENS_AFTER( unsigned long tag, bool forget_after ) When forget_after == false, behaviour is as present: this thread acquires h-b arcs from any associated with 'tag'. When forget_after == true, this thread acquires h-b arcs from any associated with 'tag', and then all h-b arcs in 'tag' are removed. This allows checkers to atomically discard resources associated with 'tag' if they want. IOW, the bool allows the caller to specify whether or not this is the last use of 'tag', and the tool to release resources accordingly. (In reply to comment #17) > Also, HAPPENS_BEFORE should go *before* the store and HAPPENS_AFTER should go > *after* the load. I always have trouble figuring that bit out. Looks like I guessed wrong here :-) (In reply to comment #19) > >> MHO load-acquire and store-release should be recognized by VEX > In general you can't because e.g. on x86 load-acquire and store-release are > just regular load and store once you get to assembly. The difference is in the > compiler barriers which are present in the code but not in the binary. > > See line 127 in > http://src.chromium.org/viewvc/chrome/trunk/src/base/atomicops_internals_x86_gcc.h?annotate=53716 I know. My concern was about those architectures that support load-acquire and store-release at the instruction level. But I'm not sure any such architecture is already supported by VEX ? (In reply to comment #22) > those architectures that support load-acquire and store-release > at the instruction level. Which architectures support that? >> Here's a revised proposal.
First impression: I like it! (Of course, we'll need to keep the one argument variant for compatibility and simplicity).
But please give me some time in case I have another impression :)
(In reply to comment #24) > (Of course, we'll need to keep the one argument > variant for compatibility and simplicity). Yes. I was wondering about that too. My c-preprocessor expertise is not good enough to know whether it's OK to add new 2-arg versions of the existing macros (I suspect not), or whether we will need new names. Do you know? I guess we have some luck: http://en.wikipedia.org/wiki/Variadic_macro: ... However, macros can be written to count the number of arguments that have been passed. Hi, First of all, I see two separate problems here. The first problem is resource freeing for hb objects, and the second is preciseness of verification (false positives/negatives). As for the first problem, IMVHO it should be handled [mostly] transparently (users just won't put annotations for that). For hb objects allocated in dynamic memory, it definitely must be done in free(). I believe it can be made sufficiently efficient, for example, one possible strategy would be to collect all hb objects situated in a memory block in a list located in a memory block header. As for stack-based hb objects situation is more problematic, the only realistic strategy that I can think of is to periodically/episodically probe stack pointer and flush everything "above" it. However, note that *not* flushing hb objects results not only in resource leaks, but also affects correctness (false negatives)(think of a situation when a hb object as if reused when it actually must not be reused), so ThreadSanitizer is also affected. Annotations for hb object creation/destruction should be provided as well (ANNOTATE_ATOMIC_INIT(id)/ANNOTATE_ATOMIC_DESTROY(id)). First of all, they will handle rare cases of non-address hb objects (for example, file fd can be used to synchronize write(fd)->read(fd)). They will also help handle cases when objects are reused in a type-stable manner via pooling, and other corner cases. Regarding preciseness/correctness. First, HAPPENS_BEFORE/AFTER must be executed atomically not only with regard to FORGET part but also with respect to the memory operation itself. Depending on what strategy you use (drawing hb edges from (1) all previous or (2) the last release to acquire) non-atomic execution of an annotation and respective memory access leads to either false positives or false negatives. Then, what hb edges must be drawn depends on type of memory operation being performed (load/store/rmw) and on associated memory ordering (relaxed/acquire/release/acq_rel/seq_cst) and are rather tricky. One additional flag won't solve that. One really does not want to offload the task of determining what hb edges must drawn to a user. However, we may ask a user just what operation he is performing. All that leads me to a conclusion that annotations ought to mimic interface of a fine-grained library of atomic operations: ATOMIC_INIT(id); ATOMIC_DESTROY(id); ATOMIC_STORE_RELEASE(id, val); ATOMIC_LOAD_ACQUIRE(id); ... ATOMIC_FETCH_ADD_ACQUIRE(id, val); ... ATOMIC_COMPARE_EXCAHNGE_WEAK_ACQ_REL(id, cmp, xchg); ... etc Note that these macros perform memory operations itself so that they are atomic with operations on shadow state so that real and shadow state do not get out of sync. (In reply to comment #23) > (In reply to comment #22) > > those architectures that support load-acquire and store-release > > at the instruction level. > > Which architectures support that? IA-64 has ld.acq and st.rel instructions. However it's generally unsolvable problem on binary level. Load-acquire is "downgraded" to plain MOV on x86, while relaxed compare and swap is "promoted" to sequentially consistent LOCK CMPXCHG. Not saying that some (potentially racy) memory accesses can be eliminated from binary code. However, what would be really nice is do that on source level. That is, understand semantics of source code w/o any annotations (like it's done for malloc/free/pthread_xxx/etc). Of course, source code should be expressed in fine-grained terms for it to be possible. C1x/C++0x atomics form a good basis for it, and all other interfaces should be mapped to C1x/C++0x atomics (which is basically the same as annotations). (In reply to comment #27) > The first problem is resource freeing for hb objects > [...] > As for the first problem, IMVHO it should be handled [mostly] > transparently Why? Handling it transparently and efficiently adds implementation complexity into systems which are already complex -- at least, too complex for my liking. And with the proposal in comment 20, you get it for free anyway. A minimal fix for the original bug (mis-implementation of ANNOTATE_HAPPENS_BEFORE in Helgrind) has been committed as r11624. (In reply to comment #29) > (In reply to comment #27) > > The first problem is resource freeing for hb objects > > [...] > > As for the first problem, IMVHO it should be handled [mostly] > > transparently > > Why? Handling it transparently and efficiently adds implementation > complexity into systems which are already complex -- at least, too > complex for my liking. And with the proposal in comment 20, you get > it for free anyway. And... so... why you mess with all that instrumentation stuff at all? We can ask a user to annotate all memory accesses as well ;) Provided extensive dependence on annotations it's either a way too burdensome for a user, or it's not done, or done incorrectly, or both. As for the particular proposal, HAPPENS_BEFORE( unsigned long tag, bool forget_before ) makes sense, because it relates to either store-release or rmw-release (cross fingers that all users are able to figure that out). HAPPENS_AFTER( unsigned long tag, bool forget_after ) is rather strange, because it relates to load-acquire followed by destroy, do not see any reason to strictly tie them together (what if I want to just destroy hb object w/o doing any memory operation?). So how should I annotate compare_exchange_weak(a, cmp, xchg, memory_order_release, memory_order_acquire); in my code using the proposal? (In reply to comment #20) > Here's a revised proposal. I think it covers all scenarios discussed > so far. Both macros get an extra Boolean argument: > > > HAPPENS_BEFORE( unsigned long tag, bool forget_before ) > > When forget_before == false, behaviour is as present: a h-b arc > starting at this point is added to the set associated with 'tag'. > > When forget_before == true, any previous h-b arcs associated with > 'tag' are first removed, and then one from this point is added. This > allows support of ReleaseStore/AcquireLoad discussed in comments 14 > and 15 without atomicity problems. > > IOW, the bool allows the caller to distinguish the 'overwrite' vs > 'add' scenarios. The inability to do this, + ambiguity in the spec, > was the original cause of this bug report. > > > > HAPPENS_AFTER( unsigned long tag, bool forget_after ) > > When forget_after == false, behaviour is as present: this thread > acquires h-b arcs from any associated with 'tag'. > > When forget_after == true, this thread acquires h-b arcs from any > associated with 'tag', and then all h-b arcs in 'tag' are removed. > This allows checkers to atomically discard resources associated > with 'tag' if they want. > > IOW, the bool allows the caller to specify whether or not this is the > last use of 'tag', and the tool to release resources accordingly. I'm not sure that anyone except a few experts will be able to remember the meaning of that second argument. Most people who see such an annotation in source code will have to look up what the meaning of that second argument is and most people who want to insert such an annotation will have to look up an example or documentation before being able to choose the correct value for the second argument. (In reply to comment #31) > HAPPENS_AFTER( unsigned long tag, bool forget_after ) is rather strange, > because it relates to load-acquire followed by destroy, do not see any reason > to strictly tie them together If HAPPENS_AFTER and FORGET are separate (non-atomic), then there is the possibility that the tag is re-used for some other operation by some other thread, before the FORGET happens. This would be an error. > (what if I want to just destroy hb object w/o > doing any memory operation?). Yes, this is a good point. In fact I would like to have a FORGET(tag) macro so as to be more general, as well as the 2-arg H_B and H_A macros proposed. This is despite the fact that at the moment I don't have a scenario where it would be strictly necessary. (In reply to comment #33) > (In reply to comment #31) > > > HAPPENS_AFTER( unsigned long tag, bool forget_after ) is rather strange, > > because it relates to load-acquire followed by destroy, do not see any reason > > to strictly tie them together > > If HAPPENS_AFTER and FORGET are separate (non-atomic), then there is > the possibility that the tag is re-used for some other operation by > some other thread, before the FORGET happens. This would be an error. May you provide an example? AFAIS, nobody can reuse an id, because the resource is not yet freed. Consider: HAPPENS_AFTER_AND_FORGET(obj->rc); free(obj); It's OK to split it into: HAPPENS_AFTER(obj->rc); FORGET(obj->rc); free(obj); Actually, FORGET must be placed "just before free()" rather than "straight after HAPPENS_AFTER". Remember that there are other usages of atomic objects than life-time management (reference counting). > > (what if I want to just destroy hb object w/o
> > doing any memory operation?).
>
> Yes, this is a good point. In fact I would like to have a FORGET(tag)
> macro so as to be more general, as well as the 2-arg H_B and H_A
> macros proposed. This is despite the fact that at the moment I don't
> have a scenario where it would be strictly necessary.
What do you mean by "it" - separate FORGET() or H_A + flag?
Fixed in version 3.7.0. |