Bug 268623 - s390x: excessive regtest runtimes on z900
Summary: s390x: excessive regtest runtimes on z900
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Florian Krohm
URL:
Keywords:
Depends on:
Blocks: 268618
  Show dependency treegraph
 
Reported: 2011-03-16 04:43 UTC by Florian Krohm
Modified: 2011-09-11 14:44 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Krohm 2011-03-16 04:43:37 UTC
Version:           unspecified
OS:                Linux

Certain regression test run for a very long time and possibly loop. This happens on a z900 machine. (I usually kill the process after I lose patience). 
Those testcases are: helgrind/tests/annotate_hbefore and 
helgrind/tests/pth_barrier3  


Reproducible: Always
Comment 1 Florian Krohm 2011-03-16 04:44:21 UTC
assigned to myself
Comment 2 Christian Borntraeger 2011-06-10 15:11:36 UTC
I was able to reproduce the endless loop sometimes on a z800.
Looks like this code

[...]
void* thread_fn1 ( void* arg )
{
  UWord* w = (UWord*)arg;
  delay100ms();    // ensure t2 gets to its wait first
[...]

does not ensure that t2 gets to its wait reliably on a slow machine. Changing delay100 to wait for 500ms helps. Still looking for a better solution.
Comment 3 Philippe Waroquiers 2011-06-10 23:28:23 UTC
(In reply to comment #2)
> I was able to reproduce the endless loop sometimes on a z800.
> Looks like this code

This problem might not be specific to s390x.
I also encountered infinite loop in annotate_hbefore but on some linux/amd64.
I obtain an infinite loop on:
  a 4 CORE model name      : Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz
It works ok on:
  a 24 CORE, model name      : Intel(R) Xeon(R) CPU           X7460  @ 2.66GHz
The kernel version on these two systems is the same (rhel 2.6.18-92.el5)

Julian gave the following hypothesis:

"I wonder if this is some interaction of V's unfair scheduling 
and the lack of a load fence in the spin loop in do_wait()."

So, adding a load fence (I have no idea how to to that) might solve the problem
on linux/amd64 and maybe also on s390 ?
Comment 4 Christian Borntraeger 2011-06-16 10:53:55 UTC
Florian,

regarding pth_barrier3: This testcase requires ~500MB. Your CDS system only had 256MB. Can you retest with the bigger system (and swap) or try to apply this patch:

--- drd/tests/pth_barrier.c	(revision 11793)
+++ drd/tests/pth_barrier.c	(working copy)
@@ -81,7 +81,10 @@
     t[i].b = &b;
     t[i].array = array;
     t[i].iterations = iterations;
-    pthread_create(&t[i].tid, 0, (void*(*)(void*))threadfunc, &t[i]);
+    if (pthread_create(&t[i].tid, 0, (void*(*)(void*))threadfunc, &t[i])) {
+       printf("Error creating all threads!\n");
+       abort();
+    }
   }
 
   for (i = 0; i < nthread; i++)
Comment 5 Florian Krohm 2011-08-11 17:34:24 UTC
(In reply to comment #4)
> Florian,
> 
> regarding pth_barrier3: This testcase requires ~500MB. Your CDS system only had
> 256MB. Can you retest with the bigger system (and swap) or try to apply this
> patch:
> 

This testcase now runs through quickly with the same diffs as your nightly runs (plus an additional size 1 vs size 4 difference due to MVC). 

Philippe, thanks for sharing that insight about annotate_hbefore.
I used to kill this testcase (runs for hours if you let it... )
I added a serilization op in to the loop in do_wait:

Index: helgrind/tests/annotate_hbefore.c
===================================================================
--- helgrind/tests/annotate_hbefore.c	(revision 11964)
+++ helgrind/tests/annotate_hbefore.c	(working copy)
@@ -245,8 +245,10 @@
 {
   UWord w0 = *w;
   UWord volatile * wV = w;
-  while (*wV == w0)
+  while (*wV == w0) {
+    asm volatile (".hword 0x07f0\n\t");
     ;
+  }
   ANNOTATE_HAPPENS_AFTER(w);
 }

which fixes the problem. The testcase now runs in 2 sec or so..
Comment 6 Christian Borntraeger 2011-08-12 09:55:48 UTC
>> regarding pth_barrier3: This testcase requires ~500MB. Your CDS system only had
>> 256MB. Can you retest with the bigger system (and swap) or try to apply this
>> patch:
>>
> 
> This testcase now runs through quickly with the same diffs as your nightly runs
> (plus an additional size 1 vs size 4 difference due to MVC). 

Good. We should consider to apply my patch anyway, otherwise any user of these
old CDS systems will face the same endless loop.

Christian
Comment 7 Florian Krohm 2011-08-12 15:13:31 UTC
(In reply to comment #6)
> 
> Good. We should consider to apply my patch anyway, otherwise any user of these
> old CDS systems will face the same endless loop.
> 

Done in r11967
Comment 8 Florian Krohm 2011-08-12 15:19:06 UTC
(In reply to comment #3)
> Julian gave the following hypothesis:
> 
> "I wonder if this is some interaction of V's unfair scheduling 
> and the lack of a load fence in the spin loop in do_wait()."
> 
> So, adding a load fence (I have no idea how to to that) might solve the problem
> on linux/amd64 and maybe also on s390 ?

As I mentioned in another comment, adding a serialization insn worked on s390x.
If you want to try on x86 use one of the insns that is used to implement
an Xin_MFence.  See host_x86_defs.c around line 2533.
Comment 9 Florian Krohm 2011-09-09 21:03:57 UTC
helgrind/tests/annotate_hbefore

In r12008 I added a load fence as described in comment #5. That helped then.
I'm positive. But today it's hanging again. So I'm disabling this test on s390x for now so I can get a regtest through on z900. (r12019)
Comment 10 Florian Krohm 2011-09-09 22:31:40 UTC
Regtest now runs through in just over an hour elapsed time. Not too bad for that old piece of iron (z900).
Comment 11 Philippe Waroquiers 2011-09-10 05:06:13 UTC
(In reply to comment #10)
> Regtest now runs through in just over an hour elapsed time. Not too bad for
> that old piece of iron (z900).
I tried to fix it on amd64 by adding an "sfence" instruction, but it did
not help.

A question about the state of Valgrind tests on s390 : is it ok to run
now all tests on the CDS system ?
Or is it still preferrable to not run some of these ?
Comment 12 Florian Krohm 2011-09-10 12:50:01 UTC
As of r12019 you should be able to do "make regtest" and not have it hang.
Comment 13 Christian Borntraeger 2011-09-11 12:09:28 UTC
see my comment #2. The code is obviously broken. If for some reason (loaded system, lots of processes) the other thead does not pass a certain point within 100ms the code will life lock, since we wake up before the other thread waits.

Christian
Comment 14 Florian Krohm 2011-09-11 14:44:35 UTC
(In reply to comment #13)
> see my comment #2. The code is obviously broken. If for some reason (loaded
> system, lots of processes) the other thead does not pass a certain point within
> 100ms the code will life lock, since we wake up before the other thread waits.
> 

Thanks for reminding. I missed it. Changed in r12031. Let's see how reliable that is.