Some processors like the AMD Athlon "Classic" support mmxext, a sse1 subset. This subset is not properly detected by VEX. The subset uses the same encoding as the sse1 instructions. The subset is described at: http://support.amd.com/us/Embedded_TechDocs/22466.pdf https://en.wikipedia.org/wiki/3DNow!#3DNow.21_extensions The mmxext instructions are MASKMOVQ MOVNTQ PAVGB PAVGW PMAXSW PMAXUB PMINSW PMINUB PMULHUW PSADBW PSHUFW PEXTRW PINSRW PMOVMSKB PREFETCHNTA PREFETCHT0 PREFETCHT1 PREFETCHT2 SFENCE There is already a testcase for this subset: memcheck/tests/x86/insn_mmxext none/tests/x86/insn_mmxext The prereq is slightly wrong, so it won't be tested on intel processors with full sse1 support. Fixing the prereq will make the test pass on those processors. These tests currently fails on AMD processors that have mmxext but not full sse1. Reproducible: Always
Created attachment 81782 [details] VEX part of the fix to support mmxext subset. This introduces a new VEX_HWCAPS_X86_MMXEXT that sits between the baseline (0) and VEX_HWCAPS_X86_SSE1. There is also a new x86g_dirtyhelper_CPUID_mmxext to mimics a Athlon "Classic" (Model 2, K75 "Pluto/Orion"). To impact the instruction parser as little as possible it doesn't change the order of instruction parsing except when we have just mmxext. It uses gotos to jump through the mmxext subset in that case. Luckily the mmxext subset is somewhat grouped together. Since this subset also provides sfence to code is updated slightly to take advantage of that if the when handling mfence.
Created attachment 81783 [details] valgrind part of the fix to support mmxext subset. Detects mmxext subset from cpuid information (and enables it when full sse1 is found). Also fixes the prereq of none/tests/x86/insn_mmxext.vgtest so that it also runs when full sse1 (and not just the mmxext subset) is found. It already passed on such configurations. With the VEX patch it also passes with just the mmxext subset.
Created attachment 81961 [details] Alternative VEX patch to support mmxext subset. Alternative VEX patch that instead of using some gotos to jump through the sse1 instruction subset in the parser just groups all mmxext instructions together in one block.
Used the second VEX patch after review from Julian. VEX: r2745 valgrind: r13515