Bug 264936 - vectorization might lead to 'false positives'
Summary: vectorization might lead to 'false positives'
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.5.0
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-31 12:17 UTC by Joost VandeVondele
Modified: 2020-02-03 02:40 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joost VandeVondele 2011-01-31 12:17:11 UTC
This bug report relates to two (closed invalid) bug reports in gcc bugzilla.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183

PR47522 includes a runable example in the first comment.

the issue appears to be that vectorization can result in code that loads elements beyond the last element of an allocated array. However, these loads will only happen for unaligned data, where access to the last+1 element can't trigger a page fault or other side effects (according to my interpretation of comments by gcc developers) and are never used. As such, this is considered valid.

Since this kind of code will be produced increasingly by gcc, especially for numerical codes (whenever vectorization triggers, essentially) it would be great to have this somehow dealt with in valgrind.
Comment 1 Julian Seward 2011-01-31 12:58:55 UTC
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522#c4
>
> I think valgrind should simply special-case these kind of out of bounds
> checks based on the instruction that was used.

Great.  Why don't you tell me then how I am supposed to differentiate
between a vector load that is deliberately out of bounds vs one that is
out of bounds by accident, so I can emit an error for the latter but
not for the former?
Comment 2 Joost VandeVondele 2011-01-31 13:06:33 UTC
(In reply to comment #1)

> Great.  Why don't you tell me then how I am supposed to differentiate
> between a vector load that is deliberately out of bounds vs one that is
> out of bounds by accident, so I can emit an error for the latter but
> not for the former?

Hey.... I'm a user, you're the developer ;-)

I'm really not the right person to ask. I guess there are some signatures... it is a vector load, with at least one element that is still part of an allocated array. Additionally, based on alignment the 'offending load(s)' can not cross a page boundary. Finally, the loaded byte(s) propagate as uninitialized data, but never trigger the 'used uninitialized error'. I suppose that you might get more details in the gcc bugzilla.
Comment 3 Julian Seward 2011-01-31 13:14:09 UTC
Can you objdump -d the loop containing the complained-about load,
and post the results?
Comment 4 Joost VandeVondele 2011-01-31 13:34:08 UTC
So the valgrind message I have is:

==12860== Invalid read of size 8
==12860==    at 0x400A38: integrate_gf_npbc_ (in /data03/vondele/bugs/valgrind/a.out)
==12860==    by 0x40245B: main (in /data03/vondele/bugs/valgrind/a.out)
==12860==  Address 0x58e9e40 is 0 bytes after a block of size 272 alloc'd
==12860==    at 0x4C26C3A: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12860==    by 0x402209: main (in /data03/vondele/bugs/valgrind/a.out)

The corresponding asm from objdump is:

0000000000400720 <integrate_gf_npbc_>:
  400720:       41 57                   push   %r15
  400722:       41 56                   push   %r14
  400724:       41 55                   push   %r13
  400726:       41 54                   push   %r12
  400728:       49 89 fc                mov    %rdi,%r12
  40072b:       31 ff                   xor    %edi,%edi
  40072d:       55                      push   %rbp
  40072e:       53                      push   %rbx
  40072f:       48 83 ec 50             sub    $0x50,%rsp
  400733:       49 63 18                movslq (%r8),%rbx
  400736:       45 8b 09                mov    (%r9),%r9d
  400739:       48 89 54 24 20          mov    %rdx,0x20(%rsp)
  40073e:       49 63 50 04             movslq 0x4(%r8),%rdx
  400742:       48 89 74 24 a0          mov    %rsi,-0x60(%rsp)
  400747:       49 63 70 08             movslq 0x8(%r8),%rsi
  40074b:       48 8b 84 24 b0 00 00    mov    0xb0(%rsp),%rax
  400752:       00
  400753:       48 83 c2 01             add    $0x1,%rdx
  400757:       48 29 da                sub    %rbx,%rdx
  40075a:       48 0f 48 d7             cmovs  %rdi,%rdx
  40075e:       48 89 54 24 f0          mov    %rdx,-0x10(%rsp)
  400763:       49 63 50 0c             movslq 0xc(%r8),%rdx
  400767:       48 8b 6c 24 f0          mov    -0x10(%rsp),%rbp
  40076c:       48 83 c2 01             add    $0x1,%rdx
  400770:       48 29 f2                sub    %rsi,%rdx
  400773:       48 0f af 54 24 f0       imul   -0x10(%rsp),%rdx
  400779:       48 85 d2                test   %rdx,%rdx
  40077c:       48 0f 49 fa             cmovns %rdx,%rdi
  400780:       48 89 da                mov    %rbx,%rdx
  400783:       48 01 db                add    %rbx,%rbx
  400786:       48 0f af ee             imul   %rsi,%rbp
  40078a:       48 89 7c 24 c0          mov    %rdi,-0x40(%rsp)
  40078f:       48 f7 da                neg    %rdx
  400792:       49 63 78 10             movslq 0x10(%r8),%rdi
  400796:       48 01 f6                add    %rsi,%rsi
  400799:       48 f7 d3                not    %rbx
  40079c:       48 f7 d6                not    %rsi
  40079f:       48 89 5c 24 b0          mov    %rbx,-0x50(%rsp)
  4007a4:       44 89 4c 24 cc          mov    %r9d,-0x34(%rsp)
  4007a9:       48 89 74 24 10          mov    %rsi,0x10(%rsp)
  4007ae:       48 8b b4 24 88 00 00    mov    0x88(%rsp),%rsi
  4007b5:       00
  4007b6:       48 29 ea                sub    %rbp,%rdx
  4007b9:       48 8b 6c 24 c0          mov    -0x40(%rsp),%rbp
  4007be:       48 8d 1c 3f             lea    (%rdi,%rdi,1),%rbx
  4007c2:       8b 36                   mov    (%rsi),%esi
  4007c4:       48 0f af ef             imul   %rdi,%rbp
  4007c8:       48 f7 d3                not    %rbx
  4007cb:       89 74 24 08             mov    %esi,0x8(%rsp)
  4007cf:       48 29 ea                sub    %rbp,%rdx
  4007d2:       41 39 f1                cmp    %esi,%r9d
  4007d5:       0f 8f 4d 05 00 00       jg     400d28 <integrate_gf_npbc_+0x608>
  4007db:       48 8b b4 24 90 00 00    mov    0x90(%rsp),%rsi
  4007e2:       00
  4007e3:       48 8b 7c 24 10          mov    0x10(%rsp),%rdi
  4007e8:       4c 8b 74 24 20          mov    0x20(%rsp),%r14
  4007ed:       8b 36                   mov    (%rsi),%esi
  4007ef:       89 74 24 04             mov    %esi,0x4(%rsp)
  4007f3:       48 8b b4 24 98 00 00    mov    0x98(%rsp),%rsi
  4007fa:       00
  4007fb:       8b 36                   mov    (%rsi),%esi
  4007fd:       89 74 24 0c             mov    %esi,0xc(%rsp)
  400801:       83 ee 01                sub    $0x1,%esi
  400804:       89 74 24 1c             mov    %esi,0x1c(%rsp)
  400808:       2b 74 24 04             sub    0x4(%rsp),%esi
  40080c:       d1 ee                   shr    %esi
  40080e:       89 74 24 2c             mov    %esi,0x2c(%rsp)
  400812:       48 63 74 24 0c          movslq 0xc(%rsp),%rsi
  400817:       44 8b 7c 24 2c          mov    0x2c(%rsp),%r15d
  40081c:       48 89 74 24 30          mov    %rsi,0x30(%rsp)
  400821:       49 63 f1                movslq %r9d,%rsi
  400824:       48 8d 5c 73 01          lea    0x1(%rbx,%rsi,2),%rbx
  400829:       48 0f af 74 24 c0       imul   -0x40(%rsp),%rsi
  40082f:       4c 8d 2c d9             lea    (%rcx,%rbx,8),%r13
  400833:       48 8b 4c 24 f0          mov    -0x10(%rsp),%rcx
  400838:       48 01 d6                add    %rdx,%rsi
  40083b:       48 8b 54 24 30          mov    0x30(%rsp),%rdx
  400840:       48 0f af 54 24 f0       imul   -0x10(%rsp),%rdx
  400846:       48 8d 14 16             lea    (%rsi,%rdx,1),%rdx
  40084a:       48 89 54 24 f8          mov    %rdx,-0x8(%rsp)
  40084f:       48 63 54 24 04          movslq 0x4(%rsp),%rdx
  400854:       48 0f af ca             imul   %rdx,%rcx
  400858:       48 8d 14 57             lea    (%rdi,%rdx,2),%rdx
  40085c:       49 8d 14 d6             lea    (%r14,%rdx,8),%rdx
  400860:       48 8d 0c 0e             lea    (%rsi,%rcx,1),%rcx
  400864:       48 89 54 24 38          mov    %rdx,0x38(%rsp)
  400869:       8b 54 24 04             mov    0x4(%rsp),%edx
  40086d:       48 89 4c 24 e0          mov    %rcx,-0x20(%rsp)
  400872:       48 89 4c 24 e8          mov    %rcx,-0x18(%rsp)
  400877:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
  40087c:       46 8d 7c 7a 01          lea    0x1(%rdx,%r15,2),%r15d
  400881:       44 89 7c 24 44          mov    %r15d,0x44(%rsp)
  400886:       48 8d 4c 4f 01          lea    0x1(%rdi,%rcx,2),%rcx
  40088b:       48 89 4c 24 48          mov    %rcx,0x48(%rsp)
  400890:       8b 5c 24 1c             mov    0x1c(%rsp),%ebx
  400894:       39 5c 24 04             cmp    %ebx,0x4(%rsp)
  400898:       ba ff ff ff 7f          mov    $0x7fffffff,%edx
  40089d:       0f 8f 51 03 00 00       jg     400bf4 <integrate_gf_npbc_+0x4d4>
  4008a3:       48 8b b4 24 a0 00 00    mov    0xa0(%rsp),%rsi
  4008aa:       00
  4008ab:       48 8b bc 24 a8 00 00    mov    0xa8(%rsp),%rdi
  4008b2:       00
  4008b3:       48 8b 4c 24 e8          mov    -0x18(%rsp),%rcx
  4008b8:       4c 8b 74 24 f0          mov    -0x10(%rsp),%r14
  4008bd:       4c 8b 5c 24 f0          mov    -0x10(%rsp),%r11
  4008c2:       4c 03 5c 24 e8          add    -0x18(%rsp),%r11
  4008c7:       8b 36                   mov    (%rsi),%esi
  4008c9:       8b 3f                   mov    (%rdi),%edi
  4008cb:       48 8b 5c 24 b0          mov    -0x50(%rsp),%rbx
  4008d0:       f2 44 0f 10 10          movsd  (%rax),%xmm10
  4008d5:       f2 44 0f 10 48 08       movsd  0x8(%rax),%xmm9
  4008db:       49 c1 e6 04             shl    $0x4,%r14
  4008df:       48 63 d6                movslq %esi,%rdx
  4008e2:       89 74 24 9c             mov    %esi,-0x64(%rsp)
  4008e6:       89 7c 24 98             mov    %edi,-0x68(%rsp)
  4008ea:       48 8d 0c 0a             lea    (%rdx,%rcx,1),%rcx
  4008ee:       f2 44 0f 10 40 10       movsd  0x10(%rax),%xmm8
  4008f4:       4c 89 74 24 88          mov    %r14,-0x78(%rsp)
  4008f9:       4c 8b 74 24 a0          mov    -0x60(%rsp),%r14
  4008fe:       4e 8d 1c 1a             lea    (%rdx,%r11,1),%r11
  400902:       49 8d 34 cc             lea    (%r12,%rcx,8),%rsi
  400906:       8b 4c 24 98             mov    -0x68(%rsp),%ecx
  40090a:       2b 4c 24 9c             sub    -0x64(%rsp),%ecx
  40090e:       48 8d 14 53             lea    (%rbx,%rdx,2),%rdx
  400912:       4c 8b 7c 24 f0          mov    -0x10(%rsp),%r15
  400917:       4c 8b 54 24 f0          mov    -0x10(%rsp),%r10
  40091c:       4c 03 54 24 e0          add    -0x20(%rsp),%r10
  400921:       48 8b 7c 24 38          mov    0x38(%rsp),%rdi
  400926:       49 c1 e3 03             shl    $0x3,%r11
  40092a:       4d 8d 74 d6 10          lea    0x10(%r14,%rdx,8),%r14
  40092f:       4c 8b 4c 24 e0          mov    -0x20(%rsp),%r9
  400934:       44 8b 44 24 2c          mov    0x2c(%rsp),%r8d
  400939:       83 c1 01                add    $0x1,%ecx
  40093c:       4d 01 ff                add    %r15,%r15
  40093f:       48 89 54 24 b8          mov    %rdx,-0x48(%rsp)
  400944:       89 cd                   mov    %ecx,%ebp
  400946:       89 4c 24 a8             mov    %ecx,-0x58(%rsp)
  40094a:       4c 89 7c 24 90          mov    %r15,-0x70(%rsp)
  40094f:       d1 ed                   shr    %ebp
  400951:       4c 89 74 24 d0          mov    %r14,-0x30(%rsp)
  400956:       8d 4c 2d 00             lea    0x0(%rbp,%rbp,1),%ecx
  40095a:       89 4c 24 ac             mov    %ecx,-0x54(%rsp)
  40095e:       03 4c 24 9c             add    -0x64(%rsp),%ecx
  400962:       89 4c 24 dc             mov    %ecx,-0x24(%rsp)
  400966:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40096d:       00 00 00
  400970:       44 8b 7c 24 98          mov    -0x68(%rsp),%r15d
  400975:       44 39 7c 24 9c          cmp    %r15d,-0x64(%rsp)
  40097a:       66 0f 57 ed             xorpd  %xmm5,%xmm5
  40097e:       66 0f 28 c5             movapd %xmm5,%xmm0
  400982:       66 0f 28 fd             movapd %xmm5,%xmm7
  400986:       0f 8f 1e 02 00 00       jg     400baa <integrate_gf_npbc_+0x48a>
  40098c:       8b 54 24 ac             mov    -0x54(%rsp),%edx
  400990:       85 d2                   test   %edx,%edx
  400992:       0f 84 9f 03 00 00       je     400d37 <integrate_gf_npbc_+0x617>
  400998:       83 7c 24 a8 09          cmpl   $0x9,-0x58(%rsp)
  40099d:       0f 86 94 03 00 00       jbe    400d37 <integrate_gf_npbc_+0x617>
  4009a3:       66 0f 57 e4             xorpd  %xmm4,%xmm4
  4009a7:       48 8b 54 24 b8          mov    -0x48(%rsp),%rdx
  4009ac:       4c 8b 74 24 a0          mov    -0x60(%rsp),%r14
  4009b1:       48 8b 5c 24 d0          mov    -0x30(%rsp),%rbx
  4009b6:       4f 8d 3c 1c             lea    (%r12,%r11,1),%r15
  4009ba:       66 0f 28 fc             movapd %xmm4,%xmm7
  4009be:       66 0f 28 ec             movapd %xmm4,%xmm5
  4009c2:       66 44 0f 28 dc          movapd %xmm4,%xmm11
  4009c7:       49 8d 4c d6 08          lea    0x8(%r14,%rdx,8),%rcx
  4009cc:       31 d2                   xor    %edx,%edx
  4009ce:       45 31 f6                xor    %r14d,%r14d
  4009d1:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  4009d8:       f2 44 0f 10 24 16       movsd  (%rsi,%rdx,1),%xmm12
  4009de:       41 83 c6 01             add    $0x1,%r14d
  4009e2:       f2 0f 10 31             movsd  (%rcx),%xmm6
  4009e6:       66 44 0f 16 64 16 08    movhpd 0x8(%rsi,%rdx,1),%xmm12
  4009ed:       f2 41 0f 10 04 17       movsd  (%r15,%rdx,1),%xmm0
  4009f3:       66 0f 16 71 08          movhpd 0x8(%rcx),%xmm6
  4009f8:       66 41 0f 28 dc          movapd %xmm12,%xmm3
  4009fd:       f2 44 0f 10 61 10       movsd  0x10(%rcx),%xmm12
  400a03:       66 0f 28 ce             movapd %xmm6,%xmm1
  400a07:       66 41 0f 16 44 17 08    movhpd 0x8(%r15,%rdx,1),%xmm0
  400a0e:       66 44 0f 16 61 18       movhpd 0x18(%rcx),%xmm12
  400a14:       f2 0f 10 33             movsd  (%rbx),%xmm6
  400a18:       66 0f 28 d0             movapd %xmm0,%xmm2
  400a1c:       48 83 c2 10             add    $0x10,%rdx
  400a20:       66 41 0f 14 cc          unpcklpd %xmm12,%xmm1
  400a25:       66 0f 16 73 08          movhpd 0x8(%rbx),%xmm6
  400a2a:       f2 44 0f 10 63 10       movsd  0x10(%rbx),%xmm12
  400a30:       48 83 c1 20             add    $0x20,%rcx
  400a34:       66 0f 28 c6             movapd %xmm6,%xmm0
  400a38:       66 44 0f 16 63 18       movhpd 0x18(%rbx),%xmm12
  400a3e:       66 0f 28 f1             movapd %xmm1,%xmm6
  400a42:       66 0f 59 ca             mulpd  %xmm2,%xmm1
  400a46:       48 83 c3 20             add    $0x20,%rbx
  400a4a:       41 39 ee                cmp    %ebp,%r14d
  400a4d:       66 41 0f 14 c4          unpcklpd %xmm12,%xmm0
  400a52:       66 0f 59 f3             mulpd  %xmm3,%xmm6
  400a56:       66 0f 59 d8             mulpd  %xmm0,%xmm3
  400a5a:       66 0f 58 f9             addpd  %xmm1,%xmm7
  400a5e:       66 0f 59 c2             mulpd  %xmm2,%xmm0
  400a62:       66 44 0f 58 de          addpd  %xmm6,%xmm11
  400a67:       66 0f 58 eb             addpd  %xmm3,%xmm5
  400a6b:       66 0f 58 e0             addpd  %xmm0,%xmm4
  400a6f:       0f 82 63 ff ff ff       jb     4009d8 <integrate_gf_npbc_+0x2b8>
  400a75:       66 0f 28 c4             movapd %xmm4,%xmm0
  400a79:       8b 54 24 a8             mov    -0x58(%rsp),%edx
  400a7d:       66 44 0f 28 e7          movapd %xmm7,%xmm12
  400a82:       39 54 24 ac             cmp    %edx,-0x54(%rsp)
  400a86:       66 0f 15 c0             unpckhpd %xmm0,%xmm0
  400a8a:       8b 4c 24 dc             mov    -0x24(%rsp),%ecx
  400a8e:       66 45 0f 15 e4          unpckhpd %xmm12,%xmm12
  400a93:       66 0f 28 f0             movapd %xmm0,%xmm6
  400a97:       66 0f 28 c5             movapd %xmm5,%xmm0
  400a9b:       f2 0f 58 f4             addsd  %xmm4,%xmm6
  400a9f:       66 41 0f 28 e4          movapd %xmm12,%xmm4
  400aa4:       66 0f 15 c0             unpckhpd %xmm0,%xmm0
  400aa8:       66 45 0f 28 e3          movapd %xmm11,%xmm12
  400aad:       f2 0f 58 e7             addsd  %xmm7,%xmm4
  400ab1:       66 45 0f 15 e4          unpckhpd %xmm12,%xmm12
  400ab6:       66 0f 28 f8             movapd %xmm0,%xmm7
  400aba:       f2 0f 58 fd             addsd  %xmm5,%xmm7
  400abe:       66 41 0f 28 ec          movapd %xmm12,%xmm5
  400ac3:       f2 41 0f 58 eb          addsd  %xmm11,%xmm5
  400ac8:       0f 84 90 00 00 00       je     400b5e <integrate_gf_npbc_+0x43e>
  400ace:       48 63 d1                movslq %ecx,%rdx
  400ad1:       4c 8b 7c 24 b0          mov    -0x50(%rsp),%r15
  400ad6:       4a 8d 1c 0a             lea    (%rdx,%r9,1),%rbx
  400ada:       4d 8d 34 dc             lea    (%r12,%rbx,8),%r14
  400ade:       4a 8d 1c 12             lea    (%rdx,%r10,1),%rbx
  400ae2:       49 8d 54 57 01          lea    0x1(%r15,%rdx,2),%rdx
  400ae7:       4c 8b 7c 24 a0          mov    -0x60(%rsp),%r15
  400aec:       49 8d 1c dc             lea    (%r12,%rbx,8),%rbx
  400af0:       49 8d 14 d7             lea    (%r15,%rdx,8),%rdx
  400af4:       44 8b 7c 24 98          mov    -0x68(%rsp),%r15d
  400af9:       41 29 cf                sub    %ecx,%r15d
  400afc:       31 c9                   xor    %ecx,%ecx
  400afe:       4e 8d 3c fd 08 00 00    lea    0x8(,%r15,8),%r15
  400b05:       00
  400b06:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  400b0d:       00 00 00
  400b10:       f2 41 0f 10 1e          movsd  (%r14),%xmm3
  400b15:       48 83 c1 08             add    $0x8,%rcx
  400b19:       f2 0f 10 0b             movsd  (%rbx),%xmm1
  400b1d:       49 83 c6 08             add    $0x8,%r14
  400b21:       f2 0f 10 12             movsd  (%rdx),%xmm2
  400b25:       48 83 c3 08             add    $0x8,%rbx
  400b29:       f2 0f 10 42 08          movsd  0x8(%rdx),%xmm0
  400b2e:       48 83 c2 10             add    $0x10,%rdx
  400b32:       66 44 0f 28 db          movapd %xmm3,%xmm11
  400b37:       4c 39 f9                cmp    %r15,%rcx
  400b3a:       f2 0f 59 d8             mulsd  %xmm0,%xmm3
  400b3e:       f2 44 0f 59 da          mulsd  %xmm2,%xmm11
  400b43:       f2 0f 59 c1             mulsd  %xmm1,%xmm0
  400b47:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
  400b4b:       f2 0f 58 fb             addsd  %xmm3,%xmm7
  400b4f:       f2 41 0f 58 eb          addsd  %xmm11,%xmm5
  400b54:       f2 0f 58 f0             addsd  %xmm0,%xmm6
  400b58:       f2 0f 58 e2             addsd  %xmm2,%xmm4
  400b5c:       75 b2                   jne    400b10 <integrate_gf_npbc_+0x3f0>
  400b5e:       f2 0f 10 4f 08          movsd  0x8(%rdi),%xmm1
  400b63:       f2 0f 10 57 18          movsd  0x18(%rdi),%xmm2
  400b68:       66 0f 28 dc             movapd %xmm4,%xmm3
  400b6c:       f2 0f 10 47 10          movsd  0x10(%rdi),%xmm0
  400b71:       f2 0f 59 da             mulsd  %xmm2,%xmm3
  400b75:       f2 0f 59 c5             mulsd  %xmm5,%xmm0
  400b79:       f2 0f 59 67 20          mulsd  0x20(%rdi),%xmm4
  400b7e:       f2 0f 59 e9             mulsd  %xmm1,%xmm5
  400b82:       f2 0f 59 f9             mulsd  %xmm1,%xmm7
  400b86:       f2 0f 59 f2             mulsd  %xmm2,%xmm6
  400b8a:       f2 41 0f 10 4d 00       movsd  0x0(%r13),%xmm1
  400b90:       f2 0f 58 eb             addsd  %xmm3,%xmm5
  400b94:       f2 0f 58 c4             addsd  %xmm4,%xmm0
  400b98:       f2 0f 58 fe             addsd  %xmm6,%xmm7
  400b9c:       f2 41 0f 59 6d 08       mulsd  0x8(%r13),%xmm5
  400ba2:       f2 0f 59 c1             mulsd  %xmm1,%xmm0
  400ba6:       f2 0f 59 f9             mulsd  %xmm1,%xmm7
  400baa:       f2 44 0f 58 d7          addsd  %xmm7,%xmm10
  400baf:       48 83 c7 20             add    $0x20,%rdi
  400bb3:       48 03 74 24 88          add    -0x78(%rsp),%rsi
  400bb8:       f2 44 0f 58 c8          addsd  %xmm0,%xmm9
  400bbd:       4c 03 5c 24 88          add    -0x78(%rsp),%r11
  400bc2:       4c 03 4c 24 90          add    -0x70(%rsp),%r9
  400bc7:       f2 44 0f 58 c5          addsd  %xmm5,%xmm8
  400bcc:       4c 03 54 24 90          add    -0x70(%rsp),%r10
  400bd1:       45 85 c0                test   %r8d,%r8d
  400bd4:       f2 44 0f 11 10          movsd  %xmm10,(%rax)
  400bd9:       f2 44 0f 11 48 08       movsd  %xmm9,0x8(%rax)
  400bdf:       f2 44 0f 11 40 10       movsd  %xmm8,0x10(%rax)
  400be5:       74 09                   je     400bf0 <integrate_gf_npbc_+0x4d0>
  400be7:       41 83 e8 01             sub    $0x1,%r8d
  400beb:       e9 80 fd ff ff          jmpq   400970 <integrate_gf_npbc_+0x250>
  400bf0:       8b 54 24 44             mov    0x44(%rsp),%edx
  400bf4:       3b 54 24 0c             cmp    0xc(%rsp),%edx
  400bf8:       0f 84 f5 00 00 00       je     400cf3 <integrate_gf_npbc_+0x5d3>
  400bfe:       48 8b 94 24 a0 00 00    mov    0xa0(%rsp),%rdx
  400c05:       00
  400c06:       48 8b 9c 24 a8 00 00    mov    0xa8(%rsp),%rbx
  400c0d:       00
  400c0e:       66 0f 57 c0             xorpd  %xmm0,%xmm0
  400c12:       8b 0a                   mov    (%rdx),%ecx
  400c14:       8b 33                   mov    (%rbx),%esi
  400c16:       66 0f 28 d0             movapd %xmm0,%xmm2
  400c1a:       66 0f 28 d8             movapd %xmm0,%xmm3
  400c1e:       39 f1                   cmp    %esi,%ecx
  400c20:       0f 8f b1 00 00 00       jg     400cd7 <integrate_gf_npbc_+0x5b7>
  400c26:       48 8b 5c 24 f8          mov    -0x8(%rsp),%rbx
  400c2b:       48 8b 7c 24 b0          mov    -0x50(%rsp),%rdi
  400c30:       48 63 d1                movslq %ecx,%rdx
  400c33:       4c 8b 74 24 a0          mov    -0x60(%rsp),%r14
  400c38:       66 0f 57 c9             xorpd  %xmm1,%xmm1
  400c3c:       29 ce                   sub    %ecx,%esi
  400c3e:       31 c9                   xor    %ecx,%ecx
  400c40:       48 8d 1c 1a             lea    (%rdx,%rbx,1),%rbx
  400c44:       48 8d 54 57 01          lea    0x1(%rdi,%rdx,2),%rdx
  400c49:       48 8d 34 f5 08 00 00    lea    0x8(,%rsi,8),%rsi
  400c50:       00
  400c51:       66 0f 28 d1             movapd %xmm1,%xmm2
  400c55:       49 8d 1c dc             lea    (%r12,%rbx,8),%rbx
  400c59:       49 8d 14 d6             lea    (%r14,%rdx,8),%rdx
  400c5d:       0f 1f 00                nopl   (%rax)
  400c60:       f2 0f 10 03             movsd  (%rbx),%xmm0
  400c64:       48 83 c1 08             add    $0x8,%rcx
  400c68:       f2 0f 10 1a             movsd  (%rdx),%xmm3
  400c6c:       48 83 c3 08             add    $0x8,%rbx
  400c70:       f2 0f 59 d8             mulsd  %xmm0,%xmm3
  400c74:       f2 0f 59 42 08          mulsd  0x8(%rdx),%xmm0
  400c79:       48 83 c2 10             add    $0x10,%rdx
  400c7d:       48 39 f1                cmp    %rsi,%rcx
  400c80:       f2 0f 58 cb             addsd  %xmm3,%xmm1
  400c84:       f2 0f 58 d0             addsd  %xmm0,%xmm2
  400c88:       75 d6                   jne    400c60 <integrate_gf_npbc_+0x540>
  400c8a:       48 8b 54 24 20          mov    0x20(%rsp),%rdx
  400c8f:       4c 8b 7c 24 48          mov    0x48(%rsp),%r15
  400c94:       f2 41 0f 10 65 00       movsd  0x0(%r13),%xmm4
  400c9a:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
  400c9f:       48 8b 5c 24 10          mov    0x10(%rsp),%rbx
  400ca4:       48 8b 74 24 20          mov    0x20(%rsp),%rsi
  400ca9:       f2 42 0f 10 04 fa       movsd  (%rdx,%r15,8),%xmm0
  400caf:       66 0f 28 d8             movapd %xmm0,%xmm3
  400cb3:       48 8d 54 4b 02          lea    0x2(%rbx,%rcx,2),%rdx
  400cb8:       f2 41 0f 59 45 08       mulsd  0x8(%r13),%xmm0
  400cbe:       f2 0f 59 dc             mulsd  %xmm4,%xmm3
  400cc2:       f2 0f 59 da             mulsd  %xmm2,%xmm3
  400cc6:       f2 0f 10 14 d6          movsd  (%rsi,%rdx,8),%xmm2
  400ccb:       f2 0f 59 c1             mulsd  %xmm1,%xmm0
  400ccf:       f2 0f 59 d4             mulsd  %xmm4,%xmm2
  400cd3:       f2 0f 59 d1             mulsd  %xmm1,%xmm2
  400cd7:       f2 0f 58 18             addsd  (%rax),%xmm3
  400cdb:       f2 0f 58 50 08          addsd  0x8(%rax),%xmm2
  400ce0:       f2 0f 58 40 10          addsd  0x10(%rax),%xmm0
  400ce5:       f2 0f 11 18             movsd  %xmm3,(%rax)
  400ce9:       f2 0f 11 50 08          movsd  %xmm2,0x8(%rax)
  400cee:       f2 0f 11 40 10          movsd  %xmm0,0x10(%rax)
  400cf3:       48 8b 7c 24 c0          mov    -0x40(%rsp),%rdi
  400cf8:       49 83 c5 10             add    $0x10,%r13
  400cfc:       48 01 7c 24 f8          add    %rdi,-0x8(%rsp)
  400d01:       48 01 7c 24 e0          add    %rdi,-0x20(%rsp)
  400d06:       44 8b 74 24 08          mov    0x8(%rsp),%r14d
  400d0b:       48 01 7c 24 e8          add    %rdi,-0x18(%rsp)
  400d10:       44 39 74 24 cc          cmp    %r14d,-0x34(%rsp)
  400d15:       74 11                   je     400d28 <integrate_gf_npbc_+0x608>
  400d17:       83 44 24 cc 01          addl   $0x1,-0x34(%rsp)
  400d1c:       e9 6f fb ff ff          jmpq   400890 <integrate_gf_npbc_+0x170>
  400d21:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  400d28:       48 83 c4 50             add    $0x50,%rsp
  400d2c:       5b                      pop    %rbx
  400d2d:       5d                      pop    %rbp
  400d2e:       41 5c                   pop    %r12
  400d30:       41 5d                   pop    %r13
  400d32:       41 5e                   pop    %r14
  400d34:       41 5f                   pop    %r15
  400d36:       c3                      retq
  400d37:       66 0f 57 e4             xorpd  %xmm4,%xmm4
  400d3b:       8b 4c 24 9c             mov    -0x64(%rsp),%ecx
  400d3f:       66 0f 28 ec             movapd %xmm4,%xmm5
  400d43:       66 0f 28 f4             movapd %xmm4,%xmm6
  400d47:       66 0f 28 fc             movapd %xmm4,%xmm7
  400d4b:       e9 7e fd ff ff          jmpq   400ace <integrate_gf_npbc_+0x3ae>
Comment 5 Julian Seward 2011-01-31 14:03:38 UTC
This seems to me like a bug in gcc.  From the following analysis
(start reading at 0x400a38), the value loaded from memory is never
used -- xmm12 is completely overwritten by subsequent instructions,
either in the post-loop block, or in the first instruction of the
next iteration.

==12860== Invalid read of size 8
==12860==    at 0x400A38: integrate_gf_npbc_

  # def xmm12 (low half loaded, high half zeroed)
  4009d8:       f2 44 0f 10 24 16       movsd  (%rsi,%rdx,1),%xmm12
  4009de:       41 83 c6 01             add    $0x1,%r14d
  4009e2:       f2 0f 10 31             movsd  (%rcx),%xmm6
  4009e6:       66 44 0f 16 64 16 08    movhpd 0x8(%rsi,%rdx,1),%xmm12
  4009ed:       f2 41 0f 10 04 17       movsd  (%r15,%rdx,1),%xmm0
  4009f3:       66 0f 16 71 08          movhpd 0x8(%rcx),%xmm6
  4009f8:       66 41 0f 28 dc          movapd %xmm12,%xmm3
  4009fd:       f2 44 0f 10 61 10       movsd  0x10(%rcx),%xmm12
  400a03:       66 0f 28 ce             movapd %xmm6,%xmm1
  400a07:       66 41 0f 16 44 17 08    movhpd 0x8(%r15,%rdx,1),%xmm0
  400a0e:       66 44 0f 16 61 18       movhpd 0x18(%rcx),%xmm12
  400a14:       f2 0f 10 33             movsd  (%rbx),%xmm6
  400a18:       66 0f 28 d0             movapd %xmm0,%xmm2
  400a1c:       48 83 c2 10             add    $0x10,%rdx
  400a20:       66 41 0f 14 cc          unpcklpd %xmm12,%xmm1
  400a25:       66 0f 16 73 08          movhpd 0x8(%rbx),%xmm6
  400a2a:       f2 44 0f 10 63 10       movsd  0x10(%rbx),%xmm12
  400a30:       48 83 c1 20             add    $0x20,%rcx
  400a34:       66 0f 28 c6             movapd %xmm6,%xmm0

  # load high half xmm12 (error reported here).  low half unchanged.
  400a38:       66 44 0f 16 63 18       movhpd 0x18(%rbx),%xmm12
  400a3e:       66 0f 28 f1             movapd %xmm1,%xmm6
  400a42:       66 0f 59 ca             mulpd  %xmm2,%xmm1
  400a46:       48 83 c3 20             add    $0x20,%rbx
  400a4a:       41 39 ee                cmp    %ebp,%r14d

  # reads low half xmm12 only
  400a4d:       66 41 0f 14 c4          unpcklpd %xmm12,%xmm0
  400a52:       66 0f 59 f3             mulpd  %xmm3,%xmm6
  400a56:       66 0f 59 d8             mulpd  %xmm0,%xmm3
  400a5a:       66 0f 58 f9             addpd  %xmm1,%xmm7
  400a5e:       66 0f 59 c2             mulpd  %xmm2,%xmm0
  400a62:       66 44 0f 58 de          addpd  %xmm6,%xmm11
  400a67:       66 0f 58 eb             addpd  %xmm3,%xmm5
  400a6b:       66 0f 58 e0             addpd  %xmm0,%xmm4
  400a6f:       0f 82 63 ff ff ff       jb     4009d8 # (loop head)

                                        
  400a75:       66 0f 28 c4             movapd %xmm4,%xmm0
  400a79:       8b 54 24 a8             mov    -0x58(%rsp),%edx

  # def xmm12 (overwrite both halves)
  400a7d:       66 44 0f 28 e7          movapd %xmm7,%xmm12
Comment 6 Jakub Jelinek 2011-01-31 14:04:53 UTC
Similar testcase is gcc's own libcpp/lex.c optimization, which also can access a few bytes after malloced area, as long as at least one byte in the value read is from within the malloced area.  See search_line_* routines in lex.c, not just SSE4.2/SSE2, but also even the generic C version actually does this.
I guess valgrind could mark somehow the extra bytes as undefined content and propagate it through following arithmetic instructions, complain only if some conditional jump was made solely on the undefined bits or if the undefined bits were stored somewhere (or similar heuristics).
Comment 7 Joost VandeVondele 2011-01-31 14:22:34 UTC
(In reply to comment #5)
> This seems to me like a bug in gcc.  

Unfortunately, I'm an asm novice, so I can't tell. I see Jakub is on the CC as well, so maybe he can judge?

Alternatively, I can reopen 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522
and refer here?
Comment 8 Julian Seward 2011-01-31 14:43:59 UTC
(In reply to comment #6)
> Similar testcase is gcc's own libcpp/lex.c optimization, which also can access
> a few bytes after malloced area, as long as at least one byte in the value read
> is from within the malloced area.

Those loops are (effectively) vectorised while loops, in which you use
standard carry-chain propagation tricks to ensure that the stopping
condition for the loop does not rely on the data from beyond the malloced
area.  It is not possible to vectorise them without such over-reading.

By contrast, Joost's loop (and anything gcc can vectorise) are countable
loops: the trip count is known (at run time) before the loop begins.  It
is always possible to vectorise such a loop without generating memory
over reads, by having a vector loop to do (trip_count / vector_width)
iterations, and a scalar fixup loop to do the final (trip_count % vector_width)
iterations.


> I guess valgrind could mark somehow the extra bytes as undefined content and
> propagate it through following arithmetic instructions, complain only if some
> conditional jump was made solely on the undefined bits or if the undefined bits
> were stored somewhere (or similar heuristics).

Well, maybe .. but Memcheck is too slow already.  I don't want to junk it up 
with expensive and complicated heuristics that are irrelevant for 99.9% of
the loads it will encounter.

If you can show me some way to identify just the loads that need special
treatment, then maybe.  I don't see how to identify them, though.
Comment 9 Jakub Jelinek 2011-02-18 18:28:20 UTC
Another simple testcase: https://bugzilla.redhat.com/show_bug.cgi?id=678518
I don't think 99% above is the right figure, at least with recent gcc generated code these false positives are just way too common.  We disable a bunch of them in glibc through a suppression file or overloading the strops implementations,
but when gcc inlines those there is no way to get rid of the false positives.

Can't valgrind just start tracking in more details whether the bytes are actually used or not when memcheck sees a suspect read (in most cases just an aligned read where at least the first byte is still in the allocated region and perhaps some further ones aren't)?  Force then retranslation of the bb it was used in or something similar?
Comment 10 Julian Seward 2011-02-18 19:28:18 UTC
(In reply to comment #9)
> I don't think 99% above is the right figure, at least with recent
> gcc generated

What version of gcc?
Comment 11 Jakub Jelinek 2011-02-18 19:47:47 UTC
The
#include <stdlib.h>
#include <string.h>

__attribute__((noinline)) void
foo (void *p)
{
  memcpy (p, "0123456789abcd", 15);
}

int
main (void)
{
  void *p = malloc (15);
  foo (p);
  return strlen (p) - 14;
}
testcase where strlen does this is expanded that way with GCC 4.6 (currently used e.g. in Fedora 15) with default options, but e.g. 4.5 or even earlier versions expand this the same way with -O2 -minline-all-stringops.
Comment 12 Julian Seward 2011-02-18 20:02:46 UTC
I can see this problem isn't going to go away (alas); and we are
seeing similar things on icc generated code.  I'll look into it,
but that won't happen for at least a couple of weeks.
Comment 13 Patrick J. LoPresti 2012-02-17 16:16:17 UTC
Isn't this exactly the problem that "--partial-loads-ok" is meant to address? (cf. bug 294285)

http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok
Comment 14 kapare 2012-06-14 20:01:27 UTC
Could this bug be the same issue?: bug 301922
Comment 15 André Offringa 2012-08-05 11:41:38 UTC
Hi all,

I think I'm also seeing false positives because of vectorization, that unfortunately decreases the usefulness of valgrind. Below is a minimal working example that reproduces problems with std::string. The code is basically extracted from a library I was using (casacore 1.5) and in my software it generates a lot of incorrect "invalid read"s, although the library seems to be valid (although inherriting from string would not be my preferred solution). I hope this example is of use for evaluating the problem further.

#include <string>
#include <iostream>
#include <cstring>

#include <malloc.h>

class StringAlt : public std::string
{
public:
  StringAlt(const char *c) : std::string(c) { }
  void operator=(const char *c) { std::string::operator=(c);  }
};

typedef StringAlt StringImp;
//typedef std::string StringImp; //<-- replacing prev with this also solves issue

int main(int argc, char *argv[])
{
  const char *a1 = "blaaa";
  char *a2 = strdup(a1);
  a2[2] = 0;
  StringImp s(a1);
  std::cout << "Assign A2\n";
  s = a2;
  std::cout << s << '\n';

  std::cout << "Assign A1\n";
  s = a1;
  std::cout << s << '\n';

  char *a3 = strdup(s.c_str());
  std::cout << "Assign A3\n";
  s = a3;
  std::cout << s << '\n';

  free(a2);
  free(a3);
}

Compiled with g++ Debian 4.7.1-2, with "-O2" or "-O3" results in the error below. With "-O0", it works fine. Changing the order of statements can also cause the error to disappear, which makes it very hard to debug. Output:

Assign A2                                                                                  
bl                                                               
Assign A1                                              
blaaa                                         
Assign A3             
==20872== Invalid read of size 4 
==20872==    at 0x400C5C: main (in /home/anoko/projects/test/test)                           
==20872==  Address 0x59550f4 is 4 bytes inside a block of size 6 alloc'd     
==20872==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)     
==20872==    by 0x564D911: strdup (strdup.c:43)     
==20872==    by 0x400C46: main (in /home/anoko/projects/test/test)                 
==20872==                                                                                                                                                   
blaaa
Comment 16 Kamil Dudka 2012-08-05 13:34:21 UTC
(In reply to comment #15)
Try to rebuild the library with -fno-builtin-strdup, chances are it will make valgrind working again.
Comment 17 André Offringa 2012-08-05 17:15:02 UTC
-fno-builtin-strdup does indeed get rid of the valgrind message.