Bug 492348 - Detected nondeterminism in cachegrind reports
Summary: Detected nondeterminism in cachegrind reports
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: cachegrind (other bugs)
Version First Reported In: 3.18.1
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Nicholas Nethercote
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-29 04:31 UTC by jo.alen1@outlook.com
Modified: 2024-09-01 13:58 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
A screenshot that shows two runs for the same program running under Valgrind's Cachegrind tool. (514.99 KB, image/png)
2024-08-29 04:31 UTC, jo.alen1@outlook.com
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jo.alen1@outlook.com 2024-08-29 04:31:03 UTC
Created attachment 173070 [details]
A screenshot that shows two runs for the same program running under Valgrind's Cachegrind tool.

Hi Valgrind Maintainers and Team!

I have been using Valgrind as a tool to check for nondeterministic behaviors and I know that there are multiple subtools that Valgrind's suite comes with. I would like to inquire about the Cachegrind subtool in analyzing some of the reports generated after each run against a binary. While there seems to be minor differences in values, I felt to raise this issue as we've noticed that 34 out of 41 possible tested repositories yielded changes across those files and those files all reported consistent values across the board. Would the team/maintainers care to explain the reasons behind this nondeterminism or remedy a solution? 

Thank you very much!

STEPS TO REPRODUCE
1. Use a GitHub Actions runner with Ubuntu 22.04 + Valgrind 3.18.1 pulled from the Ubuntu package manager
2. Run your executable with Cachegrind enabled. 
3. Convert cachegrind logs to cg_annotate for human-readable format.

OBSERVED RESULT
Specifically, we see that the following fields change drastically during runtime of the hashcat program (which reported flakiness for 6/10 runs) between two sample runs. The percents look the same to me except for a few fields, but the values recorded for a few fields look pretty different. 

ADDITIONAL INFORMATION
Refer to the following screenshot of the difference. The red and green here resembles the removal and additions respectively.
Comment 1 Paul Floyd 2024-08-30 04:09:27 UTC
Is the test exe totally deterministic?

An example of non-determinism is a threaded application that uses spinlocks. The number of CPU instructions spent in the spins will vary from run to run.

We can’t do much with just a screenshot. Can you provide a small reproducer?
Comment 2 Paul Floyd 2024-08-30 04:16:34 UTC
370k instructions is not a lot and most of them are in the link loader. Looks like there is variation in looking for glibc tunables. Can you be certain that for both runs the environment variables are strictly identical? If either the number of env vars or any of their names change I would expect there to be a difference in the number of CPU cycles required for “getenv()”.
Comment 3 jo.alen1@outlook.com 2024-08-31 19:29:56 UTC
(In reply to Paul Floyd from comment #1)
> Is the test exe totally deterministic?
> 
> An example of non-determinism is a threaded application that uses spinlocks.
> The number of CPU instructions spent in the spins will vary from run to run.
> 
> We can’t do much with just a screenshot. Can you provide a small reproducer?

Of course! I'm sorry if I responded late to this thread...had some things I was busy with. I should've been more specific as to the "if the test exe is totally deterministic", which I believe it is where I noticed that in GitHub Actions, it showed nondeterministic reporting in 6/10 runs. Now, I felt like GitHub Actions was probably including more program overhead before the main executable could start, so I tried on my local Ubuntu 22.04 desktop and found out that running the `./hashcat` command standalone did not report any nondeterminism for 0/10 runs and was very consistent across those 10 runs. However, when I tried with hashcat and provided it more options...
> hashcat -D 1 -m 0 -a 0 -o cracked.txt example.hashes example.wordlist 
(you can use the example.hashes and wordlist given in this folder: https://drive.google.com/drive/folders/1ncz7tR1xrhSNBtysd6RWd0DLehsIMSI7?usp=sharing)
...I noticed only 5/10 runs showing some nondeterminism, where the rest of the five had either very similar or no variations in their logs after exporting it using cg_annotate. 

To reproduce the hashcat example from the screenshot:
1. You'll need to fork the hashcat GitHub repository and create a GitHub Action to run Valgrind on the repository
2. In that Action, you need to checkout that repository and then run `make` in the folder of that repository
3. Then run Cachegrind on the ./hashcat executable for ten consecutive times and run 

Also, with the given C file I created to demonstrate some variation of nondeterminism using those spinlocks you mentioned, and I deduce that this C program provided 5/10 meaningful nondeterministic where the percents seem to vastly change on a local run. 

To reproduce the small reproducer example:
1. Download the C file from this drive link: https://drive.google.com/file/d/1QIVUAFuWbPcPAas8pfCz3H1-E0xxgmf8/view?usp=sharing
2. Compile using gcc and enable the pthread options 
3. Run Cachegrind on the made executable 10 times and then use cg_annotate to get the readable version from the Cachegrind made log
Comment 4 Sam James 2024-09-01 10:12:16 UTC
For what it's worth, some advice when making these reports (as a maintainer, but not of Valgrind):
* Don't rely on screenshots. It has a certain smell but also, not everyone can even see screenshots (e.g. screenreader users).
* Don't file multiple bugs if they're based on a tool. File one, mention you have more, and ask how the project wants you to proceed. It might be that there's information they need you to provide so you can make the other reports better (avoids needless roundtrips).
* Include clear instructions as to how to reproduce it both with the tool _and ideally without_ (analyse the issue a bit please, don't just say "my tool found X").
Comment 5 jo.alen1@outlook.com 2024-09-01 13:58:30 UTC
(In reply to Sam James from comment #4)
> For what it's worth, some advice when making these reports (as a maintainer,
> but not of Valgrind):
> * Don't rely on screenshots. It has a certain smell but also, not everyone
> can even see screenshots (e.g. screenreader users).
> * Don't file multiple bugs if they're based on a tool. File one, mention you
> have more, and ask how the project wants you to proceed. It might be that
> there's information they need you to provide so you can make the other
> reports better (avoids needless roundtrips).
> * Include clear instructions as to how to reproduce it both with the tool
> _and ideally without_ (analyse the issue a bit please, don't just say "my
> tool found X").

Ah, thank you for that! I will keep that in mind. It was more like these bug reports I made for each Valgrind product; each had its unique issue that I analyzed. As for the screenshots, they were an intuitive medium for showing the differences, and I can share the individual reports if needed. Mainly, I'd like to remedy the situation I encountered or obtain insights into my observations so I don't put as much information into reproducing the issue if what I observed was perhaps expected.