Created attachment 145143 [details] proposed patch to add clobber masks to inline asm statements SUMMARY *** On a Power10-based server, valgrind segfaults during initialization. (Problem found by Tulio, investigated by Tulio and Will). After some debug we found out that r20 is being clobbered at coregrind/m_machine.c:1417: __asm__ __volatile__(".long 0x7f1401b6"); /* brh RA, RS */ Preliminary patch to add clobber options to the asm stanzas is attached. *** STEPS TO REPRODUCE 1. Run valgrind on a power10 2. observe segfault early in valgrind startup 3. OBSERVED RESULT segfault EXPECTED RESULT no segfault SOFTWARE/OS VERSIONS Linux on Powerpc / power10 ADDITIONAL INFORMATION This problem has only been seen on power10, and occurs due to an instruction only exercised on power10. But, there is potential for some of the other inline asm to have similar side affects. Patch should address these.
Created attachment 145669 [details] updated patch to add clobber masks to inline asm statements Updated version of the patch from Will
Patch committed commit 3ea8d4327003c3cefe8e82c59be8e92dcfe1a60f (HEAD -> master, origin/master, origin/HEAD) Author: Carl Love <cel@us.ibm.com> Date: Fri Jan 14 23:04:44 2022 +0000 Assorted changes to protect from side affects from the feature checking code. Patch contributed by Will Schmidt <will_schmidt@vnet.ibm.com> This problem was initially reported by Tulio, he assisted me in identifying the underlying issue here. This was discovered on a Power10, and occurs since the ISA 3.1 support check uses the brh instruction via a hardcoded ".long 0x7f1401b6" asm stanza. That encoding writes to r20, and since the stanza does not contain a clobber the compiler did not know to save or restore that register upon entry or exit. The junk value remaining in r20 subsequently caused a segfault. This patch adds clobber masks to the instruction stanzas, as well as updates the associated comments to clarify which registers are being used. As part of this change I've also - updated the .long for the cnttzw instruction to write to r20, and zeroed the reserved bits from that instruction so it is properly decoded by the disassembler. - updated the .long for the dadd instruction to write to f0. I've inspected the current codegen with these changes in place, and confirm that r20 is now saved and restored on entry and exit from the machine_get_hwcaps() function. bugzilla 447995 Valgrind segfault on power10 due to hwcap checking code