Bug 409429 - False positives at unexpected location due to failure to recognize cmpeq as a dependency breaking idiom
Summary: False positives at unexpected location due to failure to recognize cmpeq as a...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.11.0
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-02 18:46 UTC by Travis Downs
Modified: 2022-04-20 09:11 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Travis Downs 2019-07-02 18:46:00 UTC
Consider the following reproduction case:

#include <immintrin.h>

__attribute__((noinline))
int gather(const int *array, void *p, void *p2) {
    __m256i ret = _mm256_i32gather_epi32((const int*)array, _mm256_setzero_si256(), 4);
    return ret[0];
}

struct bunch_of_chars {
    char stuff[128];
};

int main() {
    int* array = new int[8]{};
    bunch_of_chars x, y = x;
    return gather(array, &x, &y);
}

Here, some copying of uninitialized value happens at y = x. Neither x nor y are used, so this doesn't usually trigger a warning in Valgrind (and indeed such copying of uninitalized values w/o using them is common so avoiding a false positive here is important).

Then, however, we call gather() function, only only operates on zero-initialized array using a gather instruction. We only pass pointers to x and y to avoid the compiler optimizing them away completely (you could do it in other ways too, e.g., assigning them to volatile: in the real example this was derived from some parts of x and y were later used).

We get a valgrind failure as follows:

==13033== Memcheck, a memory error detector
==13033== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13033== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13033== Command: ./bench
==13033== 
==13033== Use of uninitialised value of size 8
==13033==    at 0x400718: _mm256_i32gather_epi32 (avx2intrin.h:1598)
==13033==    by 0x400718: gather(int const*, void*, void*) (main.cpp:6)
==13033==    by 0x400608: main (in /home/tdowns/dev/sorted-intersections/bench)

Here's the assembly for gather function:

0000000000400710 <gather(int const*, void*, void*)>:
  400710:	vpxor  xmm1,xmm1,xmm1
  400714:	vpcmpeqd ymm2,ymm2,ymm2
  400718:	vpgatherdd ymm0,DWORD PTR [rdi+ymm1*4],ymm2
  40071e:	vmovq  rax,xmm0
  400723:	vzeroupper 
  400726:	ret    
  400727:	nop    WORD PTR [rax+rax*1+0x0]

Here's an excerpt of the assembly leading up to the main function:

  40057f:	vmovdqa xmm0,XMMWORD PTR [rsp]
  400584:	vmovdqa xmm1,XMMWORD PTR [rsp+0x10]
  40058a:	mov    rsi,rsp
  40058d:	lea    rdx,[rsp+0x80]
  400595:	vmovdqa xmm2,XMMWORD PTR [rsp+0x20]
  40059b:	vmovdqa xmm3,XMMWORD PTR [rsp+0x30]
  4005a1:	mov    rdi,rax
  4005a4:	vmovdqa xmm4,XMMWORD PTR [rsp+0x40]
  4005aa:	vmovdqa xmm5,XMMWORD PTR [rsp+0x50]
  4005b0:	vmovaps XMMWORD PTR [rsp+0x80],xmm0
  4005b9:	vmovdqa xmm6,XMMWORD PTR [rsp+0x60]
  4005bf:	vmovdqa xmm7,XMMWORD PTR [rsp+0x70]
  4005c5:	vmovaps XMMWORD PTR [rsp+0x90],xmm1
  4005ce:	vmovaps XMMWORD PTR [rsp+0xa0],xmm2
  4005d7:	vmovaps XMMWORD PTR [rsp+0xb0],xmm3
  4005e0:	vmovaps XMMWORD PTR [rsp+0xc0],xmm4
  4005e9:	vmovaps XMMWORD PTR [rsp+0xd0],xmm5
  4005f2:	vmovaps XMMWORD PTR [rsp+0xe0],xmm6
  4005fb:	vmovaps XMMWORD PTR [rsp+0xf0],xmm7
  400604:	call   400710 <gather(int const*, void*, void*)>

We can see the uninitialized copy happening (all the movaps calls). Note that at this point, the xmm2 register is "tainted" since it contains uninitialized value. This is what triggers the false positive: the vgatherdd instruction uses ymm2 as input (the mask register). However, ymm2 was already set to "all ones" by the earlier 'vpcmpeqd ymm2,ymm2,ymm2' instruction, so its value is no longer tainted and has no relationship to the earlier xmm2 value.

You can show that ymm2 is causing the problem by using inline assembly to insert a clearing at the start of gather:

asm volatile ("vpxor %xmm2, %xmm2, %xmm2");

This has no effect on the logic, but now Valgrind is happy.

Valgrind should understand vpcmpeq* with the same register for both inputs as independent of the previous value as this is common.

This issue is very hard to track down because the false positive occurs at an unrelated place, potentially very far away from the original source since the xmm registers may go a long time w/o being used.


SOFTWARE/OS VERSIONS
Linux: Ubuntu 16.04
Valgrind: 3.11
g++: 8.1.0

compiled with:

g++-8 -DNDEBUG -Wall -Wextra -O2 -g -march=haswell -Wno-unused-parameter  -Wno-error=unused-variable -Wno-unknown-pragmas -std=c++14  -c -o main.o main.cpp

although really it's -O2 that matters (so you get the vmovaps-based copy in main.
Comment 1 Philippe Waroquiers 2019-07-05 20:46:21 UTC
For info, reproduced on Ubuntu 19.04 with g++ 8.3.0 and valgrind trunk.
Comment 2 Julian Seward 2019-12-28 16:26:29 UTC
I think I fixed this on the 'grail' (noise-reduction) branch, but it
has not yet been merged to trunk.  That will happen before the next
release.  The commit is:

commit 96de5118f5332ae145912ebe91b8fa143df74b8d
Author: Julian Seward <jseward@acm.org>
Date:   Sat Nov 16 08:30:10 2019 +0100

    Fold Iop_CmpEQ32x8(x,x) to all-1s ..
    
    .. hence treating it as a dependency-breaking idiom.  Also handle the
    resulting IRConst_V256(0xFFFFFFFF) in the amd64 insn selector.
Comment 3 Julian Seward 2020-01-02 07:03:12 UTC
Fixed on the trunk, 79dd0bd6e88a65f435799d5d84165c260c9bbda7.
Comment 4 Étienne Dupuis 2022-04-20 09:11:41 UTC
The following function allows detection of whether valgrind has the bug or not, when compiled with clang + AVX2 enabled:

[[clang::optnone]]
static bool checkValgrindBug409429Fixed()
{
   if (RUNNING_ON_VALGRIND) {
      alignas(32) uint8_t m[32];
      __m256i r = *reinterpret_cast<const __m256i *>(m);
      *reinterpret_cast<__m256i *>(m) = _mm256_cmpeq_epi8(r, r);
      if ((m[0] != 0xFF) || VALGRIND_COUNT_ERRORS)
         return false;
   }

   return true;
}