Bug 394361

Summary:	[Enhancement] : Client request to control thread-yielding in valgrind
Product:	[Developer tools] valgrind	Reporter:	Manish Goel <manish.dce>
Component:	general	Assignee:	Julian Seward <jseward>
Status:	REPORTED ---
Severity:	normal
Priority:	NOR
Version First Reported In:	3.14 SVN
Target Milestone:	---
Platform:	RedHat Enterprise Linux
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	patch-file

Description Manish Goel 2018-05-17 09:12:13 UTC

Created attachment 112702 [details]
patch-file

Hi, 

I have created a valgrind client-request "VALGRIND_YIELD", which makes current running thread in valgrind to yield. 
This helps in scenario where, app has multiple consumer threads and those threads are processing executing grabbed-objects. And there is a possibility of data-race between execution of 2 grabbed objects. But since helgrind by default runs a thread for 100000 basic-blocks, a single consumer-thread tends to grab all object and hence no race happens with helgrind.
But with this client-request, after client-specific number of grabbed objects a consumer-thread can yield to other consumer-thread and we can re-produce race-causing scenario with helgrind as well.

I have patch attached with bugz. Kindly review and patch it into valgrind.

Thanks & Regards
Manish Goel

Comment 1 Julian Seward 2018-09-03 06:40:36 UTC

Did you try without your patch, but the the flag --fair-sched=yes ?

Comment 2 Manish Goel 2018-09-04 03:14:41 UTC

(In reply to Julian Seward from comment #1)
> Did you try without your patch, but the the flag --fair-sched=yes ?

Yes, I did try with --fair-sched=yes but it didn't worked and there were still data-races.

Further elaboration -- 
Imaging we have 3 jobs i.e. J1, J2, J3 to be done in parallel and 2 thread i.e. T1, T2. And there is a syncpoint after all these jobs are done between threads. Also these jobs are of unequal size and possibly of size about few 100 basic-blocks.
Now these threads are competing against each other to capture-&-process these jobs. 
So in normal run -- T1 would acquire say J1. And T2 would acquire say J2. And whoever finishes first acquires J3 (say T1). 

But with helgrind -- only one of thread say T1(picked fairly) would be scheduled to run. And would end up capturing-&-processing all jobs because of 100K basic-block heuristic. And since all jobs happened in a single thread. No data-race between them would be reported.

With this patch -- we start with a thread say T1(picked fairly), which would then process J1. Afterwards, it would yield (because of newly added client-request). Then thread T2 would process J2, report data-races between T1.J1-&-T1.J2 and yield. And then T1 would acquire J3, report data-races between T1.J2-&-T1.J3 and will go to syncpoint.
With help of this patch we were able to see data-races between T1.J1-&-T1.J2 and T1.J2-&-T1.J3. And still missed data-races because of T1.J1-&-T1.J3

Kindly let me know, if you have some more thoughts or suggestions on this.

Thanks & Regards
Manish Goel