Bug 357059 - x86: SSE cvtpi2ps with memory source does transition to MMX state
Summary: x86: SSE cvtpi2ps with memory source does transition to MMX state
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.10.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-22 17:29 UTC by Janne Grunau
Modified: 2016-10-19 16:01 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
simplified test case from libav's checkasm (411 bytes, text/x-csrc)
2015-12-22 17:31 UTC, Janne Grunau
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Janne Grunau 2015-12-22 17:29:23 UTC
Valgrind does a transition to MMX state on cvtpi2ps with memory source operand which can't be observed on various CPUs (tested on Haswell, Core2 Duo, AMD K8, at least).

According to Intel's Instruction Reference Manual from September 2015 Valgrind's behaviour is correct. A ancient version from 1999 (http://www.c-jump.com/CIS77/reference/Intel/CIS77_24319102/pg_0162.htm) however describes the observed behaviour.

For cvtpi2pd the observed behaviour is actually documented, see bug #210264 and vex commit r1961.

I'm not 100% sure valgrind should deviate from the documentation so a clarification from Intel or AMD would be helpful.

Reproducible: Always

Steps to Reproduce:
1. gcc -o cvtpi2ps_test cvtpi2ps_test.c
2. valgrind ./cvtpi2ps_test

Actual Results:  
prints "cvtpi2ps caused a transition to MMX state"

Expected Results:  
prints nothing
Comment 1 Janne Grunau 2015-12-22 17:31:04 UTC
Created attachment 96265 [details]
simplified test case from libav's checkasm
Comment 2 Julian Seward 2016-09-19 21:04:33 UTC
I'm not sure your test program is correct.  The tag word is 16 bits
at byte offsets 8 and 9, but the program tests fenv[9] and [10].

That said .. even after changing the 9 and 10 to 8 and 9, it still
gives different results natively vs on V.  So something's up here.

Is this just a curiosity, or is it causing a problem for you?
Comment 3 Julian Seward 2016-09-19 21:18:54 UTC
If I had to guess, I would say that the Sept 2015 Intel docs are wrong,
and that this instruction (cvtpi2ps) should behave the same way as 
cvtpi2pd does -- that is, a transition to MMX state happens only if 
the source is a MMX, not when it is a memory operand.  Unfortunately
the AMD docs I have don't say anything at all about it.
Comment 4 Janne Grunau 2016-09-30 10:05:49 UTC
(In reply to Julian Seward from comment #2)
> I'm not sure your test program is correct.  The tag word is 16 bits
> at byte offsets 8 and 9, but the program tests fenv[9] and [10].
> 
> That said .. even after changing the 9 and 10 to 8 and 9, it still
> gives different results natively vs on V.  So something's up here.

Oops, yes, the sample program is wrong but the real check uses the correct offset: https://git.libav.org/?p=libav.git;a=blob;f=tests/checkasm/x86/checkasm.asm;h=55212fc24b3be71f25eb3e9f8066bd2cee1c5eef;hb=HEAD#l227

> Is this just a curiosity, or is it causing a problem for you?

It's more than curiosity. We added tests for handwritten asm in libav (see tests/checkasm/). It also checks if the asm follows calling convention. I.e restores callee saved registers, makes no assumption of the upper half of int arguments on 64-bit targets and checks if the fpu state was restored properly. The latter check failed under valgrind on a function using cvtpi2ps with a memory operand and no other MMX usage.

It only affects a function targeting SSE which will be only used if SSE2 is not available so I added the emms in https://git.libav.org/?p=libav.git;a=commitdiff;h=8563f9887194b07c972c3475d6b51592d77f73f7 .  So it's not really a problem for us although there is still the issue that valgrind's behaviour differs from all CPU I tested.
Comment 5 Julian Seward 2016-10-19 16:01:45 UTC
Fixed as described in comment #3.  Janne, thanks for spotting this.