Bug 243935 - Helgrind: implementation of ANNOTATE_HAPPENS_BEFORE() / AFTER() is not correct
Summary: Helgrind: implementation of ANNOTATE_HAPPENS_BEFORE() / AFTER() is not correct
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: helgrind (show other bugs)
Version: 3.6 SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-08 12:25 UTC by Bart Van Assche
Modified: 2012-01-16 16:48 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bart Van Assche 2010-07-08 12:25:35 UTC
Version:           3.6 SVN (using KDE 4.4.4) 
OS:                Linux

While no races should be reported for the test program drd/tests/annotate_smart_pointer, Helgrind does report a race. Note: the ANNOTATE_HAPPENS_BEFORE() / AFTER() client requests in drd.h have been defined such that these are by both Helgrind and DRD.

$ svn info .|grep Revision
Revision: 11170
$ ./vg-in-place --tool=drd drd/tests/annotate_smart_pointer
==29720== drd, a thread error detector
==29720== Copyright (C) 2006-2010, and GNU GPL'd, by Bart Van Assche.
==29720== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==29720== Command: drd/tests/annotate_smart_pointer
==29720==
Done.
==29720==
==29720== For counts of detected and suppressed errors, rerun with: -v
==29720== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 14)
[bart@cell-user valgrind]$ ./vg-in-place --tool=helgrind drd/tests/annotate_smart_pointer
==29726== Helgrind, a thread error detector
==29726== Copyright (C) 2007-2010, and GNU GPL'd, by OpenWorks LLP et al.
==29726== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==29726== Command: drd/tests/annotate_smart_pointer
==29726==
==29726== Thread #1 is the program's root thread
==29726==
==29726== Thread #2 was created
==29726==    at 0x333D6E627E: clone (in /lib64/libc-2.9.so)
==29726==    by 0x333E206949: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.9.so)
==29726==    by 0x4A0B08E: pthread_create_WRK (hg_intercepts.c:257)
==29726==    by 0x4A0B18E: pthread_create@* (hg_intercepts.c:288)
==29726==    by 0x400ED9: main (annotate_smart_pointer.cpp:145)
==29726==
==29726== Possible data race during write of size 4 at 0x514b068 by thread #1
==29726==    at 0x400F38: main (annotate_smart_pointer.cpp:273)
==29726==  This conflicts with a previous write of size 4 by thread #2
==29726==    at 0x4013BD: thread_func(void*) (annotate_smart_pointer.cpp:287)
==29726==    by 0x4A0B217: mythread_wrapper (hg_intercepts.c:221)
==29726==    by 0x333E2073D9: start_thread (in /lib64/libpthread-2.9.so)
==29726==    by 0x333D6E62BC: clone (in /lib64/libc-2.9.so)
==29726==  Address 0x514b068 is 40 bytes inside a block of size 48 alloc'd
==29726==    at 0x4A06EB1: operator new(unsigned long) (vg_replace_malloc.c:261)
==29726==    by 0x400D61: main (annotate_smart_pointer.cpp:316)
==29726==
Done.
==29726==
==29726== For counts of detected and suppressed errors, rerun with: -v
==29726== Use --history-level=approx or =none to gain increased speed, at
==29726== the cost of reduced accuracy of conflicting-access information
==29726== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 4)


Reproducible: Always

Steps to Reproduce:
See above.

Actual Results:  
See above.

Expected Results:  
See above.
Comment 1 Julian Seward 2011-02-24 18:31:33 UTC
This happens because on DRD, we wind up making a clreq 
  VG_USERREQ__DRD_ANNOTATE_HAPPENS_BEFORE(addr)

but on Helgrind, _VG_USERREQ__HG_USERSO_SEND_PRE(addr)

and _VG_USERREQ__HG_USERSO_SEND_PRE
    == VG_USERREQ_TOOL_BASE('H','G') + 256 + about 30

whereas VG_USERREQ__DRD_ANNOTATE_HAPPENS_BEFORE 
== VG_USERREQ_TOOL_BASE('H','G') + 256 + 33

Hmm.  So actually it seems like it should work.  Need to
investigate more.
Comment 2 Julian Seward 2011-02-24 20:37:58 UTC
Yeah, you're right.  For _HAPPENS_AFTER, DRD generates h-b edges back
to all points that have ever done _HAPPENS_BEFORE on the object.
Helgrind generates only one h-b edge, back to the most recent thread
to do _HAPPENS_BEFORE on the object.

I can see DRD's behaviour is the right one for annotating thread safe
reference counting etc.  So I can modify Helgrind to match.  But ..
is there any situation in which we would want to say "forget about
any previous _HAPPENS_BEFORE calls on the given object" ?  It seems
somehow unclean that we don't have a way to say that.  Specifically,
suppose that the memory address is eventually re-used as part of a new
object.  I don't want h-b edges from any older use of that memory to
get mixed up with h-b edges from newer uses.  (If you see what I mean).

I'm sure we've had a discussion on this issue at some point in the
past (possibly w/ Konstantin), but I can't remember where now.
Comment 3 Julian Seward 2011-02-24 20:39:25 UTC
kcc, can you comment on comment #2 2nd para?
Comment 4 Konstantin Serebryany 2011-02-24 20:47:34 UTC
This is how we annotate ref counting: http://code.google.com/p/data-race-test/wiki/DynamicAnnotations#Reference_counting

We never found a need to 'forget' previous happens_before events. 
Do you have a *realistic* example where this is needed to avoid false negative?
Comment 5 Bart Van Assche 2011-02-24 20:54:26 UTC
(In reply to comment #4)
> This is how we annotate ref counting:
> http://code.google.com/p/data-race-test/wiki/DynamicAnnotations#Reference_counting
> 
> We never found a need to 'forget' previous happens_before events. 
> Do you have a *realistic* example where this is needed to avoid false negative?

There is something missing in the lockless example, namely a memory barrier before and after AtomicDecrementByOne(). See e.g. this discussion: http://old.nabble.com/Tweaking-atomics-and-memory-barriers-in-shared_ptr-td13442631.html.
Comment 6 Timur Iskhodzhanov 2011-02-24 21:03:14 UTC
Kostya,
What about a custom allocator with free-list-like pool of RefCounted<> objects of the same size? The refcounter memory locations will be re-used over and over.

I think we can flush the H-B signal map (signaller_map_) once a memory address gets deleted (e.g. HandleFree).
Shouldn't be a problem, right?
Comment 7 Timur Iskhodzhanov 2011-02-24 21:13:58 UTC
Bart,
in real Google code our AtomicDecrementByOne is barrier'ed :)
Comment 8 Julian Seward 2011-02-24 23:52:15 UTC
(In reply to comment #4)
> We never found a need to 'forget' previous happens_before events. 
> Do you have a *realistic* example where this is needed to avoid false negative?

No, I don't.  But to turn the question around, do you have a formal
or semi-formal argument (proof) that not 'forgetting' previous events
is never going to lead to false negatives?  Personally, I'm lazy :)
and if it's difficult to construct such a proof, I'd prefer to
implement a _FORGET_ macro, on the basis that it's easier than
proving it is never necessary.
Comment 9 Konstantin Serebryany 2011-02-25 06:52:22 UTC
Bart: of course, I assumed that AtomicDecrementByOne() implements all required barriers. You may check the way it is done in our code: 
http://src.chromium.org/viewvc/chrome/trunk/src/base/atomic_ref_count.h?view=markup

Timur: if we have a free-list of ref counted objects (or malloc/free serves as such free list) then the dispose/reuse pair (i.e. free/malloc) will also create a hb-arc. So, having an extra hb-arc from the previous life of the ref counted object will not add anything. 

Julian: no, no proof. However, the most frequent case (as above, the object is reused after free and following malloc) is prone to false negatives.
Comment 10 Konstantin Serebryany 2011-02-25 08:38:02 UTC
my bad English... 
s/prone to false negatives/protected from false negatives/
Comment 11 Bart Van Assche 2011-02-25 12:54:51 UTC
(In reply to comment #8)
> But to turn the question around, do you have a formal
> or semi-formal argument (proof) that not 'forgetting' previous events
> is never going to lead to false negatives?  Personally, I'm lazy :)
> and if it's difficult to construct such a proof, I'd prefer to
> implement a _FORGET_ macro, on the basis that it's easier than
> proving it is never necessary.

Existing mechanisms (freeing memory) should be sufficient. Additionally, if a user inserts such a _FORGET_ macro at the wrong place that can result in a false negative, isn't it ?
Comment 12 Julian Seward 2011-03-10 22:54:51 UTC
(In reply to comment #4)
> We never found a need to 'forget' previous happens_before events. 

A different reason to have a 'forget' macro is to help avoid resource
leaks in the tools.  It could happen that an application creates a
large number of objects subject to thread-safe reference counting.
Then HAPPENS_BEFORE / HAPPENS_AFTER will be called on a large number
of different addresses.  At least in Helgrind, each such call will
result in a permanent reference to a vector timestamp being created
(2, in fact).  After a while these come to occupy a large amount of
memory.  (This really happened to me.)

A 'forget' macro could therefore help, because the tool can release
all resources (including VTSs) associated with the tracked address.

Now, you could argue that such forgetting and resource releasing
could instead be automatically done, when memory is deallocated.  But:

(1) that assumes the arguments to HAPPENS_BEFORE / HAPPENS_AFTER are
    memory addresses.  But they could be completely arbitrary machine
    words; nothing says that they have to be interpreted as memory
    addresses.

(2) it's expensive, to have to scan a mapping potentially containing
    hundreds of thousands of addresses, to see if any fall inside an
    address range that is just about to be freed.
Comment 13 Julian Seward 2011-03-10 22:57:48 UTC
(In reply to comment #11)
> Additionally, if a
> user inserts such a _FORGET_ macro at the wrong place that can result in a
> false negative, isn't it ?

That is true.  But you can also cause false negatives by putting
HAPPENS_BEFORE / HAPPENS_AFTER in the wrong place.  So I don't think
that the fact that the macro can be incorrectly used is a good
argument for not having it.
Comment 14 Konstantin Serebryany 2011-03-11 08:00:37 UTC
I partially agree with Julian about the resource problem. 
For the hb objects that reside in heap we can implement fast deletion, but that won't be zero cost. Also agree that the hb object could be an arbitrary word, although I haven't seen cases where such words would be numerous. 
Yet another point: ThreadSanitizer flushes state when it runs out of resources -- so we never observed this problem in the wild (yes, you can write a simple stress test that will exercise it).

However, I would not expect that this macro (if added) will be used by anyone who does not know the gory details of race detector. 

A side note: Dmitry Vyukov (who may want to comment here as well) mentioned that the current two annotations, HAPPENS_BEFORE/HAPPENS_AFTER are not good enough to annotate synchronization done by atomic ReleaseStore and AcquireLoad functions. 
Currently, if two threads do HAPPENS_BEFORE(x) and a the third thread does HAPPENS_AFTER(x), the detectors will draw 2 arcs: T1=>T3 and T2=>T3. This behavior is correct if the hb-relation is introduced by mutex operations or a CAS, but not correct if introduced by ReleaseStore/AcquireLoad.
For these we need to add either T1=>T3 *or* T2=>T3 (whichever ReleaseStore happened last). 
Dmitry, did I get the idea right? 
I am still not convinced that this justifies adding one more annotation though :)
Comment 15 Julian Seward 2011-03-11 12:27:41 UTC
> For these we need to add either T1=>T3 *or* T2=>T3 (whichever ReleaseStore
> happened last). 

You could _almost_ implement that semantics if a FORGET function was
available, like this

   T1, T2:   ReleaseStore(x); FORGET(x); HAPPENS_BEFORE(x);

   T2:       HAPPENS_AFTER(x); AcquireLoad(x);

By putting a FORGET immediately before the HAPPENS_BEFORE, we get
just one h-b arc, from whichever of T1 or T2 runs last.

I say "almost" because the FORGET and HAPPENS_BEFORE need to be
atomic, and obviously they aren't, in this example.
Comment 16 Julian Seward 2011-03-11 12:29:50 UTC
Duh, I meant

   T3:       HAPPENS_AFTER(x); AcquireLoad(x);
Comment 17 Konstantin Serebryany 2011-03-11 12:30:24 UTC
yea :) 
Also, HAPPENS_BEFORE should go *before* the store and HAPPENS_AFTER should go *after* the load.
Comment 18 Bart Van Assche 2011-03-11 12:38:09 UTC
(In reply to comment #14)
> A side note: Dmitry Vyukov (who may want to comment here as well) mentioned
> that the current two annotations, HAPPENS_BEFORE/HAPPENS_AFTER are not good
> enough to annotate synchronization done by atomic ReleaseStore and AcquireLoad
> functions.

IMHO load-acquire and store-release should be recognized by VEX and handled similarly to Ist_LLSC instead of being annotated with client requests.
Comment 19 Konstantin Serebryany 2011-03-11 12:42:48 UTC
>> MHO load-acquire and store-release should be recognized by VEX 
In general you can't because e.g. on x86 load-acquire and store-release are just regular load and store once you get to assembly. The difference is in the compiler barriers which are present in the code but not in the binary. 

See line 127 in http://src.chromium.org/viewvc/chrome/trunk/src/base/atomicops_internals_x86_gcc.h?annotate=53716
Comment 20 Julian Seward 2011-03-11 12:54:52 UTC
Here's a revised proposal.  I think it covers all scenarios discussed
so far.  Both macros get an extra Boolean argument:


HAPPENS_BEFORE( unsigned long tag, bool forget_before )

When forget_before == false, behaviour is as present: a h-b arc
starting at this point is added to the set associated with 'tag'.

When forget_before == true, any previous h-b arcs associated with
'tag' are first removed, and then one from this point is added.  This
allows support of ReleaseStore/AcquireLoad discussed in comments 14
and 15 without atomicity problems.

IOW, the bool allows the caller to distinguish the 'overwrite' vs
'add' scenarios.  The inability to do this, + ambiguity in the spec,
was the original cause of this bug report.



HAPPENS_AFTER(  unsigned long tag, bool forget_after )

When forget_after == false, behaviour is as present: this thread
acquires h-b arcs from any associated with 'tag'.

When forget_after == true, this thread acquires h-b arcs from any
associated with 'tag', and then all h-b arcs in 'tag' are removed.
This allows checkers to atomically discard resources associated
with 'tag' if they want.

IOW, the bool allows the caller to specify whether or not this is the
last use of 'tag', and the tool to release resources accordingly.
Comment 21 Julian Seward 2011-03-11 12:56:23 UTC
(In reply to comment #17)
> Also, HAPPENS_BEFORE should go *before* the store and HAPPENS_AFTER should go
> *after* the load.

I always have trouble figuring that bit out.  Looks like I guessed wrong 
here :-)
Comment 22 Bart Van Assche 2011-03-11 12:58:42 UTC
(In reply to comment #19)
> >> MHO load-acquire and store-release should be recognized by VEX 
> In general you can't because e.g. on x86 load-acquire and store-release are
> just regular load and store once you get to assembly. The difference is in the
> compiler barriers which are present in the code but not in the binary. 
> 
> See line 127 in
> http://src.chromium.org/viewvc/chrome/trunk/src/base/atomicops_internals_x86_gcc.h?annotate=53716

I know. My concern was about those architectures that support load-acquire and store-release at the instruction level. But I'm not sure any such architecture is already supported by VEX ?
Comment 23 Julian Seward 2011-03-11 13:29:58 UTC
(In reply to comment #22)
> those architectures that support load-acquire and store-release
> at the instruction level.

Which architectures support that?
Comment 24 Konstantin Serebryany 2011-03-11 14:48:26 UTC
>> Here's a revised proposal.
First impression: I like it! (Of course, we'll need to keep the one argument variant for compatibility and simplicity). 

But please give me some time in case I have another impression :)
Comment 25 Julian Seward 2011-03-11 15:00:41 UTC
(In reply to comment #24)
> (Of course, we'll need to keep the one argument
> variant for compatibility and simplicity). 

Yes.  I was wondering about that too.

My c-preprocessor expertise is not good enough to know whether it's
OK to add new 2-arg versions of the existing macros (I suspect not),
or whether we will need new names.  Do you know?
Comment 26 Konstantin Serebryany 2011-03-11 15:08:56 UTC
I guess we have some luck:
http://en.wikipedia.org/wiki/Variadic_macro: 
   ... However, macros can be written to count the number of arguments that have been passed.
Comment 27 Dmitry Vyukov 2011-03-11 19:37:46 UTC
Hi,

First of all, I see two separate problems here. The first problem is resource freeing for hb objects, and the second is preciseness of verification (false positives/negatives).

As for the first problem, IMVHO it should be handled [mostly] transparently (users just won't put annotations for that). For hb objects allocated in dynamic memory, it definitely must be done in free(). I believe it can be made sufficiently efficient, for example, one possible strategy would be to collect all hb objects situated in a memory block in a list located in a memory block header. As for stack-based hb objects situation is more problematic, the only realistic strategy that I can think of is to periodically/episodically probe stack pointer and flush everything "above" it. However, note that *not* flushing hb objects results not only in resource leaks, but also affects correctness (false negatives)(think of a situation when a hb object as if reused when it actually must not be reused), so ThreadSanitizer is also affected.
Annotations for hb object creation/destruction should be provided as well (ANNOTATE_ATOMIC_INIT(id)/ANNOTATE_ATOMIC_DESTROY(id)). First of all, they will handle rare cases of non-address hb objects (for example, file fd can be used to synchronize write(fd)->read(fd)). They will also help handle cases when objects are reused in a type-stable manner via pooling, and other corner cases.

Regarding preciseness/correctness. 
First, HAPPENS_BEFORE/AFTER must be executed atomically not only with regard to FORGET part but also with respect to the memory operation itself. Depending on what strategy you use (drawing hb edges from (1) all previous or (2) the last release to acquire) non-atomic execution of an annotation and respective memory access leads to either false positives or false negatives.
Then, what hb edges must be drawn depends on type of memory operation being performed (load/store/rmw) and on associated memory ordering (relaxed/acquire/release/acq_rel/seq_cst) and are rather tricky. One additional flag won't solve that. One really does not want to offload the task of determining what hb edges must drawn to a user. However, we may ask a user just what operation he is performing.
All that leads me to a conclusion that annotations ought to mimic interface of a fine-grained library of atomic operations:
ATOMIC_INIT(id);
ATOMIC_DESTROY(id);
ATOMIC_STORE_RELEASE(id, val);
ATOMIC_LOAD_ACQUIRE(id);
...
ATOMIC_FETCH_ADD_ACQUIRE(id, val);
...
ATOMIC_COMPARE_EXCAHNGE_WEAK_ACQ_REL(id, cmp, xchg);
...
etc
Note that these macros perform memory operations itself so that they are atomic with operations on shadow state so that real and shadow state do not get out of sync.
Comment 28 Dmitry Vyukov 2011-03-11 19:46:01 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > those architectures that support load-acquire and store-release
> > at the instruction level.
> 
> Which architectures support that?

IA-64 has ld.acq and st.rel instructions.
However it's generally unsolvable problem on binary level. Load-acquire is "downgraded" to plain MOV on x86, while relaxed compare and swap is "promoted" to sequentially consistent LOCK CMPXCHG. Not saying that some (potentially racy) memory accesses can be eliminated from binary code.

However, what would be really nice is do that on source level. That is, understand semantics of source code w/o any annotations (like it's done for malloc/free/pthread_xxx/etc). Of course, source code should be expressed in fine-grained terms for it to be possible. C1x/C++0x atomics form a good basis for it, and all other interfaces should be mapped to C1x/C++0x atomics (which is basically the same as annotations).
Comment 29 Julian Seward 2011-03-11 19:50:31 UTC
(In reply to comment #27)
> The first problem is resource freeing for hb objects 
> [...]
> As for the first problem, IMVHO it should be handled [mostly]
> transparently

Why?  Handling it transparently and efficiently adds implementation
complexity into systems which are already complex -- at least, too
complex for my liking.  And with the proposal in comment 20, you get
it for free anyway.
Comment 30 Julian Seward 2011-03-11 20:05:26 UTC
A minimal fix for the original bug (mis-implementation of
ANNOTATE_HAPPENS_BEFORE in Helgrind) has been committed as r11624.
Comment 31 Dmitry Vyukov 2011-03-11 20:28:39 UTC
(In reply to comment #29)
> (In reply to comment #27)
> > The first problem is resource freeing for hb objects 
> > [...]
> > As for the first problem, IMVHO it should be handled [mostly]
> > transparently
> 
> Why?  Handling it transparently and efficiently adds implementation
> complexity into systems which are already complex -- at least, too
> complex for my liking.  And with the proposal in comment 20, you get
> it for free anyway.

And... so... why you mess with all that instrumentation stuff at all? We can ask a user to annotate all memory accesses as well ;)
Provided extensive dependence on annotations it's either a way too burdensome for a user, or it's not done, or done incorrectly, or both.

As for the particular proposal, HAPPENS_BEFORE( unsigned long tag, bool forget_before ) makes sense, because it relates to either store-release or rmw-release (cross fingers that all users are able to figure that out). HAPPENS_AFTER(  unsigned long tag, bool forget_after ) is rather strange, because it relates to load-acquire followed by destroy, do not see any reason to strictly tie them together (what if I want to just destroy hb object w/o doing any memory operation?).

So how should I annotate 
compare_exchange_weak(a, cmp, xchg, memory_order_release, memory_order_acquire);
in my code using the proposal?
Comment 32 Bart Van Assche 2011-03-11 20:43:38 UTC
(In reply to comment #20)
> Here's a revised proposal.  I think it covers all scenarios discussed
> so far.  Both macros get an extra Boolean argument:
> 
> 
> HAPPENS_BEFORE( unsigned long tag, bool forget_before )
> 
> When forget_before == false, behaviour is as present: a h-b arc
> starting at this point is added to the set associated with 'tag'.
> 
> When forget_before == true, any previous h-b arcs associated with
> 'tag' are first removed, and then one from this point is added.  This
> allows support of ReleaseStore/AcquireLoad discussed in comments 14
> and 15 without atomicity problems.
> 
> IOW, the bool allows the caller to distinguish the 'overwrite' vs
> 'add' scenarios.  The inability to do this, + ambiguity in the spec,
> was the original cause of this bug report.
> 
> 
> 
> HAPPENS_AFTER(  unsigned long tag, bool forget_after )
> 
> When forget_after == false, behaviour is as present: this thread
> acquires h-b arcs from any associated with 'tag'.
> 
> When forget_after == true, this thread acquires h-b arcs from any
> associated with 'tag', and then all h-b arcs in 'tag' are removed.
> This allows checkers to atomically discard resources associated
> with 'tag' if they want.
> 
> IOW, the bool allows the caller to specify whether or not this is the
> last use of 'tag', and the tool to release resources accordingly.

I'm not sure that anyone except a few experts will be able to remember the meaning of that second argument. Most people who see such an annotation in source code will have to look up what the meaning of that second argument is and most people who want to insert such an annotation will have to look up an example or documentation before being able to choose the correct value for the second argument.
Comment 33 Julian Seward 2011-03-11 20:46:57 UTC
(In reply to comment #31)

> HAPPENS_AFTER(  unsigned long tag, bool forget_after ) is rather strange,
> because it relates to load-acquire followed by destroy, do not see any reason
> to strictly tie them together

If HAPPENS_AFTER and FORGET are separate (non-atomic), then there is
the possibility that the tag is re-used for some other operation by
some other thread, before the FORGET happens.  This would be an error.

> (what if I want to just destroy hb object w/o
> doing any memory operation?).

Yes, this is a good point.  In fact I would like to have a FORGET(tag)
macro so as to be more general, as well as the 2-arg H_B and H_A
macros proposed.  This is despite the fact that at the moment I don't
have a scenario where it would be strictly necessary.
Comment 34 Dmitry Vyukov 2011-03-13 08:06:10 UTC
(In reply to comment #33)
> (In reply to comment #31)
> 
> > HAPPENS_AFTER(  unsigned long tag, bool forget_after ) is rather strange,
> > because it relates to load-acquire followed by destroy, do not see any reason
> > to strictly tie them together
> 
> If HAPPENS_AFTER and FORGET are separate (non-atomic), then there is
> the possibility that the tag is re-used for some other operation by
> some other thread, before the FORGET happens.  This would be an error.

May you provide an example?
AFAIS, nobody can reuse an id, because the resource is not yet freed. Consider:
HAPPENS_AFTER_AND_FORGET(obj->rc);
free(obj);
It's OK to split it into:
HAPPENS_AFTER(obj->rc);
FORGET(obj->rc);
free(obj);
Actually, FORGET must be placed "just before free()" rather than "straight after HAPPENS_AFTER". Remember that there are other usages of atomic objects than life-time management (reference counting).
Comment 35 Dmitry Vyukov 2011-03-13 08:39:13 UTC
> > (what if I want to just destroy hb object w/o
> > doing any memory operation?).
> 
> Yes, this is a good point.  In fact I would like to have a FORGET(tag)
> macro so as to be more general, as well as the 2-arg H_B and H_A
> macros proposed.  This is despite the fact that at the moment I don't
> have a scenario where it would be strictly necessary.

What do you mean by "it" - separate FORGET() or H_A + flag?
Comment 36 Bart Van Assche 2012-01-16 16:48:24 UTC
Fixed in version 3.7.0.