Bug 442168 - Floating-point erroneous behavior with memcheck with regards to class (isfinite(), etc.)
Summary: Floating-point erroneous behavior with memcheck with regards to class (isfini...
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.15 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-08 11:47 UTC by Xavier Roche
Modified: 2021-09-23 20:13 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Minimal program demonstrating floating-point erroneous behavior with memcheck (1.15 KB, text/x-c++src)
2021-09-08 11:47 UTC, Xavier Roche
Details
ASM diff between working program with valgrind (clang-11) and the faulty one (clang12) (18.18 KB, patch)
2021-09-23 12:02 UTC, Xavier Roche
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xavier Roche 2021-09-08 11:47:20 UTC
Created attachment 141383 [details]
Minimal program demonstrating floating-point erroneous behavior with memcheck

SUMMARY

Floating-point erroneous behavior with memecheck with regards to class (isfinite(), etc.), leading to consider -NaN as a finite number.


STEPS TO REPRODUCE
1. Compile the attached reproducible minimal case with clang-12 in at least O2 and -march=corei7-avx

clang-12 -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug-with-clang12-O2-corei7-avx.cpp -lc++ -lm -o isfinite-bug

2. Run without valgrind

./isfinite-bug

3. Run with valgrind (valgrind --tool=memcheck)

valgrind --tool=memcheck ./isfinite-bug


OBSERVED RESULT

Run without valgrind: "All right" is emitted

Run with valgrind: "Error: expected 1 and got -nan" is emitted


EXPECTED RESULT

The valgrind version should always emit "All right"


SOFTWARE/OS VERSIONS
Linux, Ubuntu, 5.4.0-80-generic

ADDITIONAL INFORMATION
The issue is reproduced using valgrind-3.15.0

It requires:
* clang-12
* At least O2
* -march=corei7-avx

It is _not_ reproduced with an earlier version of clang (tested version: clang-11), which hints of a recent special NaN/fp class optimization.

At this stage, I can not guarantee that this is a valgrind issue, or a undefined-behavior-generated-code that happen to work without valgrind.

I'm available for any additional information.
Comment 1 Xavier Roche 2021-09-08 12:51:28 UTC
Bug also present with valgrind-3.18.0.GIT (cadf0432290b0bc147c7b5dd54c63bc94986743c)
Comment 2 Xavier Roche 2021-09-08 14:27:14 UTC
Additional notes: isolating the faulty function

static double (*volatile pvalidate)(double a) = validate;  // used in minimalTest
double validate(double a)
{
    return std::isfinite(a) ? a : 1.0f;
}

The difference between the correctly executed code under valgrind and the faulty one:

 _Z8validated:                           # @_Z8validated
        pushq   %rax
        .cfi_def_cfa_offset 16
-       movsd   %xmm0, (%rsp)                   # 8-byte Spill
+       vmovsd  %xmm0, (%rsp)                   # 8-byte Spill
        callq   _Z8isfiniteIdENSt3__19enable_ifIXaasr3std13is_arithmeticIT_EE5valuesr3std14numeric_limitsIS2_EE12has_infinityEbE4typeES2_
-       movsd   (%rsp), %xmm0                   # 8-byte Reload
+       vmovsd  (%rsp), %xmm0                   # 8-byte Reload
                                         # xmm0 = mem[0],zero
        testb   %al, %al
        jne     .LBB6_2
 # %bb.1:
-       movsd   .LCPI6_0(%rip), %xmm0           # xmm0 = mem[0],zero
+       vmovsd  .LCPI6_0(%rip), %xmm0           # xmm0 = mem[0],zero
 .LBB6_2:
        popq    %rax
        retq
Comment 3 Paul Floyd 2021-09-23 09:10:07 UTC
Do you know if it is possible to reproduce this issue either with GCC or an older C++ standard, or even both? The reason that I ask is that integrating this testcase, at least on Linux, will be rather difficult.

I can do some testing on FreeBSD. Normally I only use the default system compiler (clang 10 on FreeBSD 12 and clang 11 on FreeBSD 13), and my PC is rather old so I'm not sure that  -march=corei7-avx will work forme.
Comment 4 Julian Seward 2021-09-23 10:44:28 UTC
(In reply to Xavier Roche from comment #2)

> The difference between the correctly executed code under valgrind and the
> faulty one:

> -       movsd   %xmm0, (%rsp)                   # 8-byte Spill
> +       vmovsd  %xmm0, (%rsp)                   # 8-byte Spill

> -       movsd   (%rsp), %xmm0                   # 8-byte Reload
> +       vmovsd  (%rsp), %xmm0                   # 8-byte Reload

> -       movsd   .LCPI6_0(%rip), %xmm0           # xmm0 = mem[0],zero
> +       vmovsd  .LCPI6_0(%rip), %xmm0           # xmm0 = mem[0],zero

Can you give some more information about why you think the change from
movsd to vmovsd causes the error?  Also, which one gives correct execution
and which doesn't?
Comment 5 Julian Seward 2021-09-23 10:51:28 UTC
Looking at the Intel docs for MOVSD and VMOVSD, and comparing against
what guest_amd64_toIR.c implements, I don't see any error in the 
implementation.  Plus, both of those must be at least moderately commonly
used, and so if there was really an emulation bug with them we would
probably have heard about it long ago.  So I find it a bit difficult to
believe that this MOVSD/VMOVSD difference is really the root cause of
whatever it is that you are seeing.
Comment 6 Xavier Roche 2021-09-23 12:02:19 UTC
Created attachment 141820 [details]
ASM diff between working program with valgrind (clang-11) and the faulty one (clang12)

clang-12 -S -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug.cpp -o /tmp/isfinite-bug-12.S
clang-11 -S -std=c++20 -stdlib=libc++ -O2 -march=corei7-avx isfinite-bug.cpp -o /tmp/isfinite-bug-11.S
diff -udb /tmp/isfinite-bug-11.S /tmp/isfinite-bug-12.S > /tmp/isfinite-bug-clang-11-clang-12.diff
Comment 7 Xavier Roche 2021-09-23 12:02:39 UTC
(In reply to Julian Seward from comment #4)
> (In reply to Xavier Roche from comment #2)
> 
> > The difference between the correctly executed code under valgrind and the
> > faulty one:
> 
> > -       movsd   %xmm0, (%rsp)                   # 8-byte Spill
> > +       vmovsd  %xmm0, (%rsp)                   # 8-byte Spill
> 
> Can you give some more information about why you think the change from
> movsd to vmovsd causes the error?  Also, which one gives correct execution
> and which doesn't?

The correct one is movsd; the faulty one is vmovsd. This only happens with clang-12 [12.0.0-3ubuntu1~20.04.3] (not clang-11) and only when running valgrind.

g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) does not trigger the problem either.

So this is a specific clang-12 issue.

As for the reason, I'm not fluent enough on x86-64 assembly to make any educated guess unfortunately. This might have nothing to do with vmovsd
Comment 8 Julian Seward 2021-09-23 13:29:42 UTC
(In reply to Xavier Roche from comment #7)

I checked V's {V,}MOVSD implementation more, and still find no problem
I think this is unrelated to those insns.

If I had to guess, I'd say it *might* be related to the use of

+	vcmpneq_oqsd	.LCPI1_1(%rip), %xmm1, %xmm1

since the implementation is a bit shaky:

  // 0xC: this isn't really right because it returns all-1s when
  // either operand is a NaN, and it should return all-0s.
  case 0x4:  XXX(False, False, Iop_CmpEQ32Fx4, True);  break; // NEQ_UQ
  case 0xC:  XXX(False, False, Iop_CmpEQ32Fx4, True);  break; // NEQ_OQ

(from `findSSECmpOp`).  But that's just a guess; without a way to reproduce
the failure, I think the chances of resolving this is pretty low.
Comment 9 Paul Floyd 2021-09-23 15:54:04 UTC
I'll have a try with this tonight to see what I can reproduce.
Comment 10 Paul Floyd 2021-09-23 20:13:26 UTC
As I sort of expected, my PC is too old:

CPU: Intel(R) Xeon(R) CPU           W3520  @ 2.67GHz (2666.73-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT>

No avx, so I just get sigill.