libvex translates cvtsi2ss into an I64StoF64, followed by an F64toF32. In some cases, this leads to loss of precision which diverges from hardware results. Here's a test case: #include <stdio.h> #include <stdint.h> int main(void) { uint64_t rbx, rcx; rbx = 0x0100000100000001; __asm__ __volatile__( "cvtsi2ss %%rbx,%%xmm2 \n" "movd %%xmm2, %%rcx \n" : "=c" (rcx) : "b" (rbx) : "%xmm2"); printf("rcx: %lx\n", rcx); return 0; } Reproducible: Always Steps to Reproduce: 1. compile test case 2. run on valgrind 3. run natively Actual Results: Valgrind gives: "rcx: 5b800000" Native gives: "rcx: 5b800001" Expected Results: Valgrind should return "rcx: 5b800001".
I guess this happens because the value is rounded twice, as opposed to once when running on real hardware.
Just ran into this myself, here is another repro where results differ when running under valgrind and not: #include <stdio.h> #include <stdlib.h> #include <string.h> int main() { long a = 0x7fffff4000000001; float b = a; unsigned c; memcpy(&c, &b, sizeof(b)); printf("%f %u\n", (double) b, c); return 0; }