Summary: | [ppc64le] VEX temporary storage exhausted with several vbpermq instructions | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Jesus Checa <jcheca> |
Component: | vex | Assignee: | Carl Love <cel> |
Status: | CLOSED FIXED | ||
Severity: | normal | CC: | cel, jseward, mark, will_schmidt |
Priority: | NOR | ||
Version First Reported In: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
patch to re-implement the vbpermq instruction using a clean helper
Fix for 32-bit Powerpc |
Description
Jesus Checa
2022-03-23 15:38:16 UTC
If you look at VEX/priv/guest_ppc_toIR.c and search for vbpermq you'll see it has a loop in which it generates VEX IR. for (i = 0; i < 16; i++) { ... I count at least 20 ops ... } so each vbpermq generates at least 320 operations. So 3 in a row generate more than a thousand VEX ops... And I wouldn't be surprised if that hits some limit Yes, it is probably generating too many iops. I will need to re-implement the support with a clean helper. Created attachment 147709 [details]
patch to re-implement the vbpermq instruction using a clean helper
The attached patch against current Valgrind mainline changes the support for the vbpermq instruction from using just Iops to use a clean helper. The clean helper significantly reduces the number of Iops thus allowing a series of multiple vbpermq instructions to be decoded without overflowing the Valgrind buffer.
Please pull down the Valgrind mainline tree, apply the patch and test it on your application to make sure it works for me. Contact me if you need help with building a patched version of Valgrind.
Please let me know if you see any additional issues with this patch. Assuming everything is fine, I will commit the patch to mainline. Thanks.
Hi Carl, Thanks for the quick patch. I verified it with a fresh valgrind repo and all looks good now. FTR, these are the steps I followed for the verification: ``` $ gcc reproducer.c -o reproducer # This is the reproducer presented in bug description $ curl https://bugsfiles.kde.org/attachment.cgi?id=147709 > vbpermq.patch $ git clone git://sourceware.org/git/valgrind.git && cd valgrind $ git apply ../vbpermq.patch $ ./autogen.sh && ./configure && make -j$(nproc) $ ./vg-in-place ../reproducer ``` No temporary storage exhausted messages now. To give it a twist, I added a ridiculously high number of vbpermq instructions (~1000) to the reproducer, and valgrind still runs perfectly, so the patch is fine from my side. Patch committed to Valgrind mainline commit 00017cda521fb3aa3e5d8b892941dbb6bd6c3c25 (HEAD -> master) Author: Carl Love <cel@us.ibm.com> Date: Wed Mar 23 13:41:16 2022 -0500 Powerpc, re-implement the vbpermq instruction support The instruction support generates too many Iops when multiple vbpermq instructions occur together in the binary. This patch changes the implementation to use a clean helper and thus avoid overflowing the internal Valgrind buffer. bugzilla 451827 Issue found with Powerpc 32-bit. Created attachment 147972 [details]
Fix for 32-bit Powerpc
The patch fixes the instruction support on 32-bit Powerpc.
Fix for 32-bit systems was tested on Power 8 BE, 32-bit and 64-bit, Power 8 LE 84-bit, Power 9, Power 10. No regressions were found. FIx for 32-bit systems committed. commit bc4dc04d5f23e363a79bade6dee475e9c2287c93 (HEAD -> master) Author: Carl Love <cel@us.ibm.com> Date: Mon Apr 4 21:31:33 2022 -0400 Powerpc 32bit, fix the vbpermq support Passing the two 128-bit vA and vB arguments doesn't work in 32-bit mode. The clean helper was changed to compute the result for 8 indexes. The helper is then called twice to get the result for the upper 64-bits of the vB register and the lower 64-bits of the vB register. The patch is an additional fix for bugzilla 451827. |