Bug 407764 - drd cond_post_wait gets wrong (?) condition on s390x z13 system
Summary: drd cond_post_wait gets wrong (?) condition on s390x z13 system
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: drd (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Andreas Arnez
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-20 13:53 UTC by Mark Wielaard
Modified: 2019-05-28 16:38 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch for fixing the drd test suite fails (11.19 KB, patch)
2019-05-23 18:15 UTC, Andreas Arnez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2019-05-20 13:53:27 UTC
This might be specific to RHEL8 which is build for s390x z13. I am not seeing this on Fedora 30 on s390x (which defaults to an older s390x instruction set).

With the following program on Fedora 30 s390x:

$ rpm -q valgrind glibc binutils gcc; uname -r
valgrind-3.15.0-1.fc30.s390x
glibc-2.29-12.fc30.s390x
binutils-2.31.1-29.fc30.s390x
gcc-9.1.1-1.fc30.s390x
5.0.16-200.fc29.s390x

$ cat pthread_cond.c 
// gcc -g -lpthread -o pthread_cond pthread_cond.c

#include <stdio.h>
#include <pthread.h>
#include <stdio.h>

static pthread_mutex_t mutex;
static pthread_cond_t cond;
static pthread_t thread; 
static int ready = 0;

static void *ThreadFunction(void *ptr)
{
   pthread_mutex_lock(&mutex);
   ready = 1;
   pthread_cond_signal(&cond);
   pthread_mutex_unlock(&mutex);
   return NULL; 
}

int main() 
{ 
   pthread_mutex_init(&mutex, NULL); 
   pthread_cond_init(&cond, NULL);

   pthread_mutex_lock(&mutex);

   printf ("Starting thread...\n");
   pthread_create(&thread, NULL, ThreadFunction, (void*) NULL);

   printf ("Waiting to be ready...\n");
   while (!ready) {
      pthread_cond_wait(&cond, &mutex);
   }
   printf ("ready!\n");
   pthread_mutex_unlock(&mutex);

   printf ("Waiting for thread to finish...\n");
   pthread_join(thread, NULL); 
   pthread_mutex_destroy(&mutex); 
   printf("finished\n");
   return 0; 
}

$ gcc -g -lpthread -o pthread_cond pthread_cond.c
$ valgrind -q --tool=drd --trace-cond=yes ./pthread_cond
==65261== [1] cond_init       cond 0x1002090
Starting thread...
==65261== [1] cond_init       cond 0x1ffefff720
==65261== [1] cond_pre_wait   cond 0x1ffefff720
==65261== [2] cond_signal     cond 0x1ffefff720
==65261== [1] cond_post_wait  cond 0x1ffefff720
==65261== [1] cond_destroy    cond 0x1ffefff720
Waiting to be ready...
==65261== [1] cond_pre_wait   cond 0x1002090
==65261== [2] cond_signal     cond 0x1002090
==65261== [1] cond_post_wait  cond 0x1002090
ready!
Waiting for thread to finish...
finished

But on RHEL8:

# rpm -q valgrind glibc binutils gcc; uname -r
valgrind-3.14.0-11.el8.s390x
glibc-2.28-53.el8.s390x
binutils-2.30-54.el8.s390x
gcc-8.2.1-3.5.el8.s390x
4.18.0-83.el8.s390x

# valgrind -q --tool=drd --trace-cond=yes ./pthread_cond
==14549== [1] cond_init       cond 0x1002090
Starting thread...
==14549== [1] cond_init       cond 0x1ffefffd00
==14549== [1] cond_pre_wait   cond 0x1ffefffd00
==14549== [2] cond_signal     cond 0x1ffefffd00
==14549== [1] cond_post_wait  cond 0x4a2a580
==14549== condition variable has been destroyed while being waited upon: cond 0x4a2a580, mutex 0x0 locked by thread 0
==14549==    at 0x483C9CE: pthread_cond_wait_intercept (drd_pthread_intercepts.c:1086)
==14549==    by 0x483C9CE: pthread_cond_wait@* (drd_pthread_intercepts.c:1091)
==14549==    by 0x4838849: vgDrd_sema_down (drd_pthread_intercepts.c:290)
==14549==    by 0x4838849: pthread_create_intercept (drd_pthread_intercepts.c:573)
==14549==    by 0x4838849: pthread_create@* (drd_pthread_intercepts.c:584)
==14549== 
==14549== [1] cond_destroy    cond 0x1ffefffd00
==14549== destruction of condition variable being waited upon: cond 0x1ffefffd00
==14549==    at 0x483C552: pthread_cond_destroy_intercept (drd_pthread_intercepts.c:1063)
==14549==    by 0x483C552: pthread_cond_destroy@* (drd_pthread_intercepts.c:1073)
==14549==    by 0x48387BF: vgDrd_sema_destroy (drd_pthread_intercepts.c:283)
==14549==    by 0x48387BF: pthread_create_intercept (drd_pthread_intercepts.c:576)
==14549==    by 0x48387BF: pthread_create@* (drd_pthread_intercepts.c:584)
==14549== cond 0x1ffefffd00 was first observed at:
==14549==    at 0x483C0AA: pthread_cond_init_intercept (drd_pthread_intercepts.c:1022)
==14549==    by 0x483C0AA: pthread_cond_init@* (drd_pthread_intercepts.c:1030)
==14549==    by 0x483810D: vgDrd_sema_init (drd_pthread_intercepts.c:276)
==14549==    by 0x48385D9: pthread_create_intercept (drd_pthread_intercepts.c:541)
==14549==    by 0x48385D9: pthread_create@* (drd_pthread_intercepts.c:584)
==14549== 
==14549== Destroying condition variable that is being waited upon: cond 0x1ffefffd00, mutex 0x0 locked by thread 0
==14549==    at 0x483C552: pthread_cond_destroy_intercept (drd_pthread_intercepts.c:1063)
==14549==    by 0x483C552: pthread_cond_destroy@* (drd_pthread_intercepts.c:1073)
==14549==    by 0x48387BF: vgDrd_sema_destroy (drd_pthread_intercepts.c:283)
==14549==    by 0x48387BF: pthread_create_intercept (drd_pthread_intercepts.c:576)
==14549==    by 0x48387BF: pthread_create@* (drd_pthread_intercepts.c:584)
==14549== 
Waiting to be ready...
==14549== [1] cond_pre_wait   cond 0x1002090
==14549== [2] cond_signal     cond 0x1002090
==14549== [1] cond_post_wait  cond 0x1002090
ready!
Waiting for thread to finish...
finished
Comment 1 Mark Wielaard 2019-05-20 14:36:01 UTC
One odd thing is that if you try to add some extra debug statements like:

diff --git a/drd/drd_pthread_intercepts.c b/drd/drd_pthread_intercepts.c
index d6abd4368..b15b75945 100644
--- a/drd/drd_pthread_intercepts.c
+++ b/drd/drd_pthread_intercepts.c
@@ -1080,9 +1080,11 @@ int pthread_cond_wait_intercept(pthread_cond_t *cond, pthread_mutex_t *mutex)
    int   ret;
    OrigFn fn;
    VALGRIND_GET_ORIG_FN(fn);
+ printf ("pthread_cond_wait_intercept pre cond: %p\n", cond);
    VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__PRE_COND_WAIT,
                                    cond, mutex, DRD_(mutex_type)(mutex), 0, 0);
    CALL_FN_W_WW(ret, fn, cond, mutex);
+ printf ("pthread_cond_wait_intercept post cond: %p\n", cond);
    VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__POST_COND_WAIT,
                                    cond, mutex, 1, 0, 0);
    return ret;

Note that now the --trace-cond=yes debug statements seem to print the correct cond address. But the new printf debug statements print the wrong cond addresses ???

So it might be some strange interaction with the VALGRIND_DO_CLIENT_REQUEST_STMT and CALL_FN_W_WW macros?
Comment 2 Bart Van Assche 2019-05-20 14:50:59 UTC
Have you considered to change ", cond" into ", &cond" in the printf() statement?
Comment 3 Mark Wielaard 2019-05-20 15:14:23 UTC
(In reply to Bart Van Assche from comment #2)
> Have you considered to change ", cond" into ", &cond" in the printf()
> statement?

But cond is already a pointer to a pthread_cond_t. 

I suspect it is some strange interaction between the VALGRIND_GET_ORIG_FN, VALGRIND_DO_CLIENT_REQUEST_STMT and
Comment 4 Mark Wielaard 2019-05-20 15:20:19 UTC
(In reply to Bart Van Assche from comment #2)
> Have you considered to change ", cond" into ", &cond" in the printf()
> statement?

But cond is already a pointer to a pthread_cond_t. And none of the macros seems to take the address of any of its arguments.

I suspect it is some strange interaction between the VALGRIND_GET_ORIG_FN, VALGRIND_DO_CLIENT_REQUEST_STMT and CALL_FN_W_WW. Maybe bad register constraints?

It replicates on Fedora with CFLAGS=-march=z13
Comment 5 Mark Wielaard 2019-05-20 16:20:18 UTC
Maybe related, or maybe not. memcheck/tests/warp6.vgtest fails on s390x.

Without -march=z13 fn_0, fn_1, fn_2 and fn_3 fail.
With -march=z13 fn_0, fn_1, fn_2 and fn_3 also fail, plus fn_10, fn_11 and fn_12.

pthread_cond_wait_intercept () wraps a function with 2 arguments.
Comment 6 Andreas Arnez 2019-05-23 18:15:44 UTC
Created attachment 120275 [details]
Patch for fixing the drd test suite fails

This patch fixes the drd test suite fails as well as the "wrap6" fail in my testing.
Comment 7 Mark Wielaard 2019-05-24 14:43:23 UTC
Tested on a non-z13 system and all wrap testcases PASS now.
Comment 8 Mark Wielaard 2019-05-24 17:26:33 UTC
Also tested on a z13 system with CFLAGS=-march=z13. All wrap tests succeed and the drd failures are gone.

(except for drd/tests/tc04_free_lock but that fails for a completely unrelated reason).
Comment 9 Andreas Arnez 2019-05-24 18:20:55 UTC
(In reply to Mark Wielaard from comment #8)
> Also tested on a z13 system with CFLAGS=-march=z13. All wrap tests succeed
> and the drd failures are gone.
Great!  I'll push the patch and close this bug early next week then.
Comment 10 Andreas Arnez 2019-05-28 16:38:49 UTC
Pushed as git commit c39ee0c37082e2d1723a8671fca2b3af029e2712.