SUMMARY Valgrind will fail with "VEX temporary storage exhausted" when running a ppc64le binary which contains several vbpermq instructions together. The following C source reproduces the issue: ``` int main(){ asm("vbpermq 6,6,10"); asm("vbpermq 7,7,10"); asm("vbpermq 8,8,10"); // if this doesn't trigger the issue, just copy-paste some asm lines. return 0; } ``` Building and running this will produce the following output: ``` # ./vg-in-place ../a.out ==592850== Memcheck, a memory error detector ==592850== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==592850== Using Valgrind-3.19.0.GIT and LibVEX; rerun with -h for copyright info ==592850== Command: ../a.out ==592850== VEX temporary storage exhausted. Pool = TEMP, start 0x598a5528 curr 0x59d5b420 end 0x59d6a067 (size 5000000) vex: the `impossible' happened: VEX temporary storage exhausted. Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile. vex storage: T total 164492400 bytes allocated vex storage: P total 192 bytes allocated valgrind: the 'impossible' happened: LibVEX called failure_exit(). host stacktrace: ==592850== at 0x58056FD8: show_sched_status_wrk (m_libcassert.c:406) ==592850== by 0x5805718F: report_and_quit (m_libcassert.c:477) ==592850== by 0x5805740B: panic (m_libcassert.c:553) ==592850== by 0x5805740B: vgPlain_core_panic_at (m_libcassert.c:558) ==592850== by 0x5805745B: vgPlain_core_panic (m_libcassert.c:563) ==592850== by 0x58076D37: failure_exit (m_translate.c:761) ==592850== by 0x581867FB: vpanic (main_util.c:253) ==592850== by 0x581868AB: private_LibVEX_alloc_OOM (main_util.c:181) ==592850== by 0x5827C0C3: LibVEX_Alloc_inline (main_util.h:176) ==592850== by 0x5827C0C3: addHInstr_SLOW (host_generic_regs.c:332) ==592850== by 0x58250F7F: addHInstr (host_generic_regs.h:402) ==592850== by 0x58252B8B: emit_instr (host_generic_reg_alloc3.c:301) ==592850== by 0x58252B8B: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1320) ==592850== by 0x58183FE3: libvex_BackEnd (main_main.c:1133) ==592850== by 0x58183FE3: LibVEX_Translate (main_main.c:1236) ==592850== by 0x5807A82F: vgPlain_translate (m_translate.c:1831) ==592850== by 0x580CD1FB: handle_tt_miss (scheduler.c:1141) ==592850== by 0x580CD1FB: vgPlain_scheduler (scheduler.c:1503) ==592850== by 0x5812209F: thread_wrapper (syswrap-linux.c:101) ==592850== by 0x5812209F: run_a_thread_NORETURN (syswrap-linux.c:154) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 592850) ==592850== at 0x100005CC: main (in /root/a.out) client stack range: [0x1FFEFF0000 0x1FFF00FFFF] client SP: 0x1FFF00E740 valgrind stack range: [0x1002DF0000 0x1002EEFFFF] top usage: 15728 of 1048576 ``` Originally found in rhel-9 with valgrind 3.18.1, but reproducible in fresh git build (see https://bugzilla.redhat.com/show_bug.cgi?id=2067187) I found this similar to this already fixed issue: https://bugs.kde.org/show_bug.cgi?id=375839
If you look at VEX/priv/guest_ppc_toIR.c and search for vbpermq you'll see it has a loop in which it generates VEX IR. for (i = 0; i < 16; i++) { ... I count at least 20 ops ... } so each vbpermq generates at least 320 operations. So 3 in a row generate more than a thousand VEX ops... And I wouldn't be surprised if that hits some limit
Yes, it is probably generating too many iops. I will need to re-implement the support with a clean helper.
Created attachment 147709 [details] patch to re-implement the vbpermq instruction using a clean helper The attached patch against current Valgrind mainline changes the support for the vbpermq instruction from using just Iops to use a clean helper. The clean helper significantly reduces the number of Iops thus allowing a series of multiple vbpermq instructions to be decoded without overflowing the Valgrind buffer. Please pull down the Valgrind mainline tree, apply the patch and test it on your application to make sure it works for me. Contact me if you need help with building a patched version of Valgrind. Please let me know if you see any additional issues with this patch. Assuming everything is fine, I will commit the patch to mainline. Thanks.
Hi Carl, Thanks for the quick patch. I verified it with a fresh valgrind repo and all looks good now. FTR, these are the steps I followed for the verification: ``` $ gcc reproducer.c -o reproducer # This is the reproducer presented in bug description $ curl https://bugsfiles.kde.org/attachment.cgi?id=147709 > vbpermq.patch $ git clone git://sourceware.org/git/valgrind.git && cd valgrind $ git apply ../vbpermq.patch $ ./autogen.sh && ./configure && make -j$(nproc) $ ./vg-in-place ../reproducer ``` No temporary storage exhausted messages now. To give it a twist, I added a ridiculously high number of vbpermq instructions (~1000) to the reproducer, and valgrind still runs perfectly, so the patch is fine from my side.
Patch committed to Valgrind mainline commit 00017cda521fb3aa3e5d8b892941dbb6bd6c3c25 (HEAD -> master) Author: Carl Love <cel@us.ibm.com> Date: Wed Mar 23 13:41:16 2022 -0500 Powerpc, re-implement the vbpermq instruction support The instruction support generates too many Iops when multiple vbpermq instructions occur together in the binary. This patch changes the implementation to use a clean helper and thus avoid overflowing the internal Valgrind buffer. bugzilla 451827
Issue found with Powerpc 32-bit.
Created attachment 147972 [details] Fix for 32-bit Powerpc The patch fixes the instruction support on 32-bit Powerpc.
Fix for 32-bit systems was tested on Power 8 BE, 32-bit and 64-bit, Power 8 LE 84-bit, Power 9, Power 10. No regressions were found.
FIx for 32-bit systems committed. commit bc4dc04d5f23e363a79bade6dee475e9c2287c93 (HEAD -> master) Author: Carl Love <cel@us.ibm.com> Date: Mon Apr 4 21:31:33 2022 -0400 Powerpc 32bit, fix the vbpermq support Passing the two 128-bit vA and vB arguments doesn't work in 32-bit mode. The clean helper was changed to compute the result for 8 indexes. The helper is then called twice to get the result for the upper 64-bits of the vB register and the lower 64-bits of the vB register. The patch is an additional fix for bugzilla 451827.