Valgrind does a transition to MMX state on cvtpi2ps with memory source operand which can't be observed on various CPUs (tested on Haswell, Core2 Duo, AMD K8, at least). According to Intel's Instruction Reference Manual from September 2015 Valgrind's behaviour is correct. A ancient version from 1999 (http://www.c-jump.com/CIS77/reference/Intel/CIS77_24319102/pg_0162.htm) however describes the observed behaviour. For cvtpi2pd the observed behaviour is actually documented, see bug #210264 and vex commit r1961. I'm not 100% sure valgrind should deviate from the documentation so a clarification from Intel or AMD would be helpful. Reproducible: Always Steps to Reproduce: 1. gcc -o cvtpi2ps_test cvtpi2ps_test.c 2. valgrind ./cvtpi2ps_test Actual Results: prints "cvtpi2ps caused a transition to MMX state" Expected Results: prints nothing
Created attachment 96265 [details] simplified test case from libav's checkasm
I'm not sure your test program is correct. The tag word is 16 bits at byte offsets 8 and 9, but the program tests fenv[9] and [10]. That said .. even after changing the 9 and 10 to 8 and 9, it still gives different results natively vs on V. So something's up here. Is this just a curiosity, or is it causing a problem for you?
If I had to guess, I would say that the Sept 2015 Intel docs are wrong, and that this instruction (cvtpi2ps) should behave the same way as cvtpi2pd does -- that is, a transition to MMX state happens only if the source is a MMX, not when it is a memory operand. Unfortunately the AMD docs I have don't say anything at all about it.
(In reply to Julian Seward from comment #2) > I'm not sure your test program is correct. The tag word is 16 bits > at byte offsets 8 and 9, but the program tests fenv[9] and [10]. > > That said .. even after changing the 9 and 10 to 8 and 9, it still > gives different results natively vs on V. So something's up here. Oops, yes, the sample program is wrong but the real check uses the correct offset: https://git.libav.org/?p=libav.git;a=blob;f=tests/checkasm/x86/checkasm.asm;h=55212fc24b3be71f25eb3e9f8066bd2cee1c5eef;hb=HEAD#l227 > Is this just a curiosity, or is it causing a problem for you? It's more than curiosity. We added tests for handwritten asm in libav (see tests/checkasm/). It also checks if the asm follows calling convention. I.e restores callee saved registers, makes no assumption of the upper half of int arguments on 64-bit targets and checks if the fpu state was restored properly. The latter check failed under valgrind on a function using cvtpi2ps with a memory operand and no other MMX usage. It only affects a function targeting SSE which will be only used if SSE2 is not available so I added the emms in https://git.libav.org/?p=libav.git;a=commitdiff;h=8563f9887194b07c972c3475d6b51592d77f73f7 . So it's not really a problem for us although there is still the issue that valgrind's behaviour differs from all CPU I tested.
Fixed as described in comment #3. Janne, thanks for spotting this.