Created attachment 141383 [details] Minimal program demonstrating floating-point erroneous behavior with memcheck SUMMARY Floating-point erroneous behavior with memecheck with regards to class (isfinite(), etc.), leading to consider -NaN as a finite number. STEPS TO REPRODUCE 1. Compile the attached reproducible minimal case with clang-12 in at least O2 and -march=corei7-avx clang-12 -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug-with-clang12-O2-corei7-avx.cpp -lc++ -lm -o isfinite-bug 2. Run without valgrind ./isfinite-bug 3. Run with valgrind (valgrind --tool=memcheck) valgrind --tool=memcheck ./isfinite-bug OBSERVED RESULT Run without valgrind: "All right" is emitted Run with valgrind: "Error: expected 1 and got -nan" is emitted EXPECTED RESULT The valgrind version should always emit "All right" SOFTWARE/OS VERSIONS Linux, Ubuntu, 5.4.0-80-generic ADDITIONAL INFORMATION The issue is reproduced using valgrind-3.15.0 It requires: * clang-12 * At least O2 * -march=corei7-avx It is _not_ reproduced with an earlier version of clang (tested version: clang-11), which hints of a recent special NaN/fp class optimization. At this stage, I can not guarantee that this is a valgrind issue, or a undefined-behavior-generated-code that happen to work without valgrind. I'm available for any additional information.
Bug also present with valgrind-3.18.0.GIT (cadf0432290b0bc147c7b5dd54c63bc94986743c)
Additional notes: isolating the faulty function static double (*volatile pvalidate)(double a) = validate; // used in minimalTest double validate(double a) { return std::isfinite(a) ? a : 1.0f; } The difference between the correctly executed code under valgrind and the faulty one: _Z8validated: # @_Z8validated pushq %rax .cfi_def_cfa_offset 16 - movsd %xmm0, (%rsp) # 8-byte Spill + vmovsd %xmm0, (%rsp) # 8-byte Spill callq _Z8isfiniteIdENSt3__19enable_ifIXaasr3std13is_arithmeticIT_EE5valuesr3std14numeric_limitsIS2_EE12has_infinityEbE4typeES2_ - movsd (%rsp), %xmm0 # 8-byte Reload + vmovsd (%rsp), %xmm0 # 8-byte Reload # xmm0 = mem[0],zero testb %al, %al jne .LBB6_2 # %bb.1: - movsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero + vmovsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero .LBB6_2: popq %rax retq
Do you know if it is possible to reproduce this issue either with GCC or an older C++ standard, or even both? The reason that I ask is that integrating this testcase, at least on Linux, will be rather difficult. I can do some testing on FreeBSD. Normally I only use the default system compiler (clang 10 on FreeBSD 12 and clang 11 on FreeBSD 13), and my PC is rather old so I'm not sure that -march=corei7-avx will work forme.
(In reply to Xavier Roche from comment #2) > The difference between the correctly executed code under valgrind and the > faulty one: > - movsd %xmm0, (%rsp) # 8-byte Spill > + vmovsd %xmm0, (%rsp) # 8-byte Spill > - movsd (%rsp), %xmm0 # 8-byte Reload > + vmovsd (%rsp), %xmm0 # 8-byte Reload > - movsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero > + vmovsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero Can you give some more information about why you think the change from movsd to vmovsd causes the error? Also, which one gives correct execution and which doesn't?
Looking at the Intel docs for MOVSD and VMOVSD, and comparing against what guest_amd64_toIR.c implements, I don't see any error in the implementation. Plus, both of those must be at least moderately commonly used, and so if there was really an emulation bug with them we would probably have heard about it long ago. So I find it a bit difficult to believe that this MOVSD/VMOVSD difference is really the root cause of whatever it is that you are seeing.
Created attachment 141820 [details] ASM diff between working program with valgrind (clang-11) and the faulty one (clang12) clang-12 -S -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug.cpp -o /tmp/isfinite-bug-12.S clang-11 -S -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug.cpp -o /tmp/isfinite-bug-11.S diff -udb /tmp/isfinite-bug-11.S /tmp/isfinite-bug-12.S > /tmp/isfinite-bug-clang-11-clang-12.diff
(In reply to Julian Seward from comment #4) > (In reply to Xavier Roche from comment #2) > > > The difference between the correctly executed code under valgrind and the > > faulty one: > > > - movsd %xmm0, (%rsp) # 8-byte Spill > > + vmovsd %xmm0, (%rsp) # 8-byte Spill > > Can you give some more information about why you think the change from > movsd to vmovsd causes the error? Also, which one gives correct execution > and which doesn't? The correct one is movsd; the faulty one is vmovsd. This only happens with clang-12 [12.0.0-3ubuntu1~20.04.3] (not clang-11) and only when running valgrind. g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) does not trigger the problem either. So this is a specific clang-12 issue. As for the reason, I'm not fluent enough on x86-64 assembly to make any educated guess unfortunately. This might have nothing to do with vmovsd
(In reply to Xavier Roche from comment #7) I checked V's {V,}MOVSD implementation more, and still find no problem I think this is unrelated to those insns. If I had to guess, I'd say it *might* be related to the use of + vcmpneq_oqsd .LCPI1_1(%rip), %xmm1, %xmm1 since the implementation is a bit shaky: // 0xC: this isn't really right because it returns all-1s when // either operand is a NaN, and it should return all-0s. case 0x4: XXX(False, False, Iop_CmpEQ32Fx4, True); break; // NEQ_UQ case 0xC: XXX(False, False, Iop_CmpEQ32Fx4, True); break; // NEQ_OQ (from `findSSECmpOp`). But that's just a guess; without a way to reproduce the failure, I think the chances of resolving this is pretty low.
I'll have a try with this tonight to see what I can reproduce.
As I sort of expected, my PC is too old: CPU: Intel(R) Xeon(R) CPU W3520 @ 2.67GHz (2666.73-MHz K8-class CPU) Origin="GenuineIntel" Id=0x106a5 Family=0x6 Model=0x1a Stepping=5 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT> No avx, so I just get sigill.