This bug report relates to two (closed invalid) bug reports in gcc bugzilla. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183 PR47522 includes a runable example in the first comment. the issue appears to be that vectorization can result in code that loads elements beyond the last element of an allocated array. However, these loads will only happen for unaligned data, where access to the last+1 element can't trigger a page fault or other side effects (according to my interpretation of comments by gcc developers) and are never used. As such, this is considered valid. Since this kind of code will be produced increasingly by gcc, especially for numerical codes (whenever vectorization triggers, essentially) it would be great to have this somehow dealt with in valgrind.
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522#c4 > > I think valgrind should simply special-case these kind of out of bounds > checks based on the instruction that was used. Great. Why don't you tell me then how I am supposed to differentiate between a vector load that is deliberately out of bounds vs one that is out of bounds by accident, so I can emit an error for the latter but not for the former?
(In reply to comment #1) > Great. Why don't you tell me then how I am supposed to differentiate > between a vector load that is deliberately out of bounds vs one that is > out of bounds by accident, so I can emit an error for the latter but > not for the former? Hey.... I'm a user, you're the developer ;-) I'm really not the right person to ask. I guess there are some signatures... it is a vector load, with at least one element that is still part of an allocated array. Additionally, based on alignment the 'offending load(s)' can not cross a page boundary. Finally, the loaded byte(s) propagate as uninitialized data, but never trigger the 'used uninitialized error'. I suppose that you might get more details in the gcc bugzilla.
Can you objdump -d the loop containing the complained-about load, and post the results?
So the valgrind message I have is: ==12860== Invalid read of size 8 ==12860== at 0x400A38: integrate_gf_npbc_ (in /data03/vondele/bugs/valgrind/a.out) ==12860== by 0x40245B: main (in /data03/vondele/bugs/valgrind/a.out) ==12860== Address 0x58e9e40 is 0 bytes after a block of size 272 alloc'd ==12860== at 0x4C26C3A: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==12860== by 0x402209: main (in /data03/vondele/bugs/valgrind/a.out) The corresponding asm from objdump is: 0000000000400720 <integrate_gf_npbc_>: 400720: 41 57 push %r15 400722: 41 56 push %r14 400724: 41 55 push %r13 400726: 41 54 push %r12 400728: 49 89 fc mov %rdi,%r12 40072b: 31 ff xor %edi,%edi 40072d: 55 push %rbp 40072e: 53 push %rbx 40072f: 48 83 ec 50 sub $0x50,%rsp 400733: 49 63 18 movslq (%r8),%rbx 400736: 45 8b 09 mov (%r9),%r9d 400739: 48 89 54 24 20 mov %rdx,0x20(%rsp) 40073e: 49 63 50 04 movslq 0x4(%r8),%rdx 400742: 48 89 74 24 a0 mov %rsi,-0x60(%rsp) 400747: 49 63 70 08 movslq 0x8(%r8),%rsi 40074b: 48 8b 84 24 b0 00 00 mov 0xb0(%rsp),%rax 400752: 00 400753: 48 83 c2 01 add $0x1,%rdx 400757: 48 29 da sub %rbx,%rdx 40075a: 48 0f 48 d7 cmovs %rdi,%rdx 40075e: 48 89 54 24 f0 mov %rdx,-0x10(%rsp) 400763: 49 63 50 0c movslq 0xc(%r8),%rdx 400767: 48 8b 6c 24 f0 mov -0x10(%rsp),%rbp 40076c: 48 83 c2 01 add $0x1,%rdx 400770: 48 29 f2 sub %rsi,%rdx 400773: 48 0f af 54 24 f0 imul -0x10(%rsp),%rdx 400779: 48 85 d2 test %rdx,%rdx 40077c: 48 0f 49 fa cmovns %rdx,%rdi 400780: 48 89 da mov %rbx,%rdx 400783: 48 01 db add %rbx,%rbx 400786: 48 0f af ee imul %rsi,%rbp 40078a: 48 89 7c 24 c0 mov %rdi,-0x40(%rsp) 40078f: 48 f7 da neg %rdx 400792: 49 63 78 10 movslq 0x10(%r8),%rdi 400796: 48 01 f6 add %rsi,%rsi 400799: 48 f7 d3 not %rbx 40079c: 48 f7 d6 not %rsi 40079f: 48 89 5c 24 b0 mov %rbx,-0x50(%rsp) 4007a4: 44 89 4c 24 cc mov %r9d,-0x34(%rsp) 4007a9: 48 89 74 24 10 mov %rsi,0x10(%rsp) 4007ae: 48 8b b4 24 88 00 00 mov 0x88(%rsp),%rsi 4007b5: 00 4007b6: 48 29 ea sub %rbp,%rdx 4007b9: 48 8b 6c 24 c0 mov -0x40(%rsp),%rbp 4007be: 48 8d 1c 3f lea (%rdi,%rdi,1),%rbx 4007c2: 8b 36 mov (%rsi),%esi 4007c4: 48 0f af ef imul %rdi,%rbp 4007c8: 48 f7 d3 not %rbx 4007cb: 89 74 24 08 mov %esi,0x8(%rsp) 4007cf: 48 29 ea sub %rbp,%rdx 4007d2: 41 39 f1 cmp %esi,%r9d 4007d5: 0f 8f 4d 05 00 00 jg 400d28 <integrate_gf_npbc_+0x608> 4007db: 48 8b b4 24 90 00 00 mov 0x90(%rsp),%rsi 4007e2: 00 4007e3: 48 8b 7c 24 10 mov 0x10(%rsp),%rdi 4007e8: 4c 8b 74 24 20 mov 0x20(%rsp),%r14 4007ed: 8b 36 mov (%rsi),%esi 4007ef: 89 74 24 04 mov %esi,0x4(%rsp) 4007f3: 48 8b b4 24 98 00 00 mov 0x98(%rsp),%rsi 4007fa: 00 4007fb: 8b 36 mov (%rsi),%esi 4007fd: 89 74 24 0c mov %esi,0xc(%rsp) 400801: 83 ee 01 sub $0x1,%esi 400804: 89 74 24 1c mov %esi,0x1c(%rsp) 400808: 2b 74 24 04 sub 0x4(%rsp),%esi 40080c: d1 ee shr %esi 40080e: 89 74 24 2c mov %esi,0x2c(%rsp) 400812: 48 63 74 24 0c movslq 0xc(%rsp),%rsi 400817: 44 8b 7c 24 2c mov 0x2c(%rsp),%r15d 40081c: 48 89 74 24 30 mov %rsi,0x30(%rsp) 400821: 49 63 f1 movslq %r9d,%rsi 400824: 48 8d 5c 73 01 lea 0x1(%rbx,%rsi,2),%rbx 400829: 48 0f af 74 24 c0 imul -0x40(%rsp),%rsi 40082f: 4c 8d 2c d9 lea (%rcx,%rbx,8),%r13 400833: 48 8b 4c 24 f0 mov -0x10(%rsp),%rcx 400838: 48 01 d6 add %rdx,%rsi 40083b: 48 8b 54 24 30 mov 0x30(%rsp),%rdx 400840: 48 0f af 54 24 f0 imul -0x10(%rsp),%rdx 400846: 48 8d 14 16 lea (%rsi,%rdx,1),%rdx 40084a: 48 89 54 24 f8 mov %rdx,-0x8(%rsp) 40084f: 48 63 54 24 04 movslq 0x4(%rsp),%rdx 400854: 48 0f af ca imul %rdx,%rcx 400858: 48 8d 14 57 lea (%rdi,%rdx,2),%rdx 40085c: 49 8d 14 d6 lea (%r14,%rdx,8),%rdx 400860: 48 8d 0c 0e lea (%rsi,%rcx,1),%rcx 400864: 48 89 54 24 38 mov %rdx,0x38(%rsp) 400869: 8b 54 24 04 mov 0x4(%rsp),%edx 40086d: 48 89 4c 24 e0 mov %rcx,-0x20(%rsp) 400872: 48 89 4c 24 e8 mov %rcx,-0x18(%rsp) 400877: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx 40087c: 46 8d 7c 7a 01 lea 0x1(%rdx,%r15,2),%r15d 400881: 44 89 7c 24 44 mov %r15d,0x44(%rsp) 400886: 48 8d 4c 4f 01 lea 0x1(%rdi,%rcx,2),%rcx 40088b: 48 89 4c 24 48 mov %rcx,0x48(%rsp) 400890: 8b 5c 24 1c mov 0x1c(%rsp),%ebx 400894: 39 5c 24 04 cmp %ebx,0x4(%rsp) 400898: ba ff ff ff 7f mov $0x7fffffff,%edx 40089d: 0f 8f 51 03 00 00 jg 400bf4 <integrate_gf_npbc_+0x4d4> 4008a3: 48 8b b4 24 a0 00 00 mov 0xa0(%rsp),%rsi 4008aa: 00 4008ab: 48 8b bc 24 a8 00 00 mov 0xa8(%rsp),%rdi 4008b2: 00 4008b3: 48 8b 4c 24 e8 mov -0x18(%rsp),%rcx 4008b8: 4c 8b 74 24 f0 mov -0x10(%rsp),%r14 4008bd: 4c 8b 5c 24 f0 mov -0x10(%rsp),%r11 4008c2: 4c 03 5c 24 e8 add -0x18(%rsp),%r11 4008c7: 8b 36 mov (%rsi),%esi 4008c9: 8b 3f mov (%rdi),%edi 4008cb: 48 8b 5c 24 b0 mov -0x50(%rsp),%rbx 4008d0: f2 44 0f 10 10 movsd (%rax),%xmm10 4008d5: f2 44 0f 10 48 08 movsd 0x8(%rax),%xmm9 4008db: 49 c1 e6 04 shl $0x4,%r14 4008df: 48 63 d6 movslq %esi,%rdx 4008e2: 89 74 24 9c mov %esi,-0x64(%rsp) 4008e6: 89 7c 24 98 mov %edi,-0x68(%rsp) 4008ea: 48 8d 0c 0a lea (%rdx,%rcx,1),%rcx 4008ee: f2 44 0f 10 40 10 movsd 0x10(%rax),%xmm8 4008f4: 4c 89 74 24 88 mov %r14,-0x78(%rsp) 4008f9: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14 4008fe: 4e 8d 1c 1a lea (%rdx,%r11,1),%r11 400902: 49 8d 34 cc lea (%r12,%rcx,8),%rsi 400906: 8b 4c 24 98 mov -0x68(%rsp),%ecx 40090a: 2b 4c 24 9c sub -0x64(%rsp),%ecx 40090e: 48 8d 14 53 lea (%rbx,%rdx,2),%rdx 400912: 4c 8b 7c 24 f0 mov -0x10(%rsp),%r15 400917: 4c 8b 54 24 f0 mov -0x10(%rsp),%r10 40091c: 4c 03 54 24 e0 add -0x20(%rsp),%r10 400921: 48 8b 7c 24 38 mov 0x38(%rsp),%rdi 400926: 49 c1 e3 03 shl $0x3,%r11 40092a: 4d 8d 74 d6 10 lea 0x10(%r14,%rdx,8),%r14 40092f: 4c 8b 4c 24 e0 mov -0x20(%rsp),%r9 400934: 44 8b 44 24 2c mov 0x2c(%rsp),%r8d 400939: 83 c1 01 add $0x1,%ecx 40093c: 4d 01 ff add %r15,%r15 40093f: 48 89 54 24 b8 mov %rdx,-0x48(%rsp) 400944: 89 cd mov %ecx,%ebp 400946: 89 4c 24 a8 mov %ecx,-0x58(%rsp) 40094a: 4c 89 7c 24 90 mov %r15,-0x70(%rsp) 40094f: d1 ed shr %ebp 400951: 4c 89 74 24 d0 mov %r14,-0x30(%rsp) 400956: 8d 4c 2d 00 lea 0x0(%rbp,%rbp,1),%ecx 40095a: 89 4c 24 ac mov %ecx,-0x54(%rsp) 40095e: 03 4c 24 9c add -0x64(%rsp),%ecx 400962: 89 4c 24 dc mov %ecx,-0x24(%rsp) 400966: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40096d: 00 00 00 400970: 44 8b 7c 24 98 mov -0x68(%rsp),%r15d 400975: 44 39 7c 24 9c cmp %r15d,-0x64(%rsp) 40097a: 66 0f 57 ed xorpd %xmm5,%xmm5 40097e: 66 0f 28 c5 movapd %xmm5,%xmm0 400982: 66 0f 28 fd movapd %xmm5,%xmm7 400986: 0f 8f 1e 02 00 00 jg 400baa <integrate_gf_npbc_+0x48a> 40098c: 8b 54 24 ac mov -0x54(%rsp),%edx 400990: 85 d2 test %edx,%edx 400992: 0f 84 9f 03 00 00 je 400d37 <integrate_gf_npbc_+0x617> 400998: 83 7c 24 a8 09 cmpl $0x9,-0x58(%rsp) 40099d: 0f 86 94 03 00 00 jbe 400d37 <integrate_gf_npbc_+0x617> 4009a3: 66 0f 57 e4 xorpd %xmm4,%xmm4 4009a7: 48 8b 54 24 b8 mov -0x48(%rsp),%rdx 4009ac: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14 4009b1: 48 8b 5c 24 d0 mov -0x30(%rsp),%rbx 4009b6: 4f 8d 3c 1c lea (%r12,%r11,1),%r15 4009ba: 66 0f 28 fc movapd %xmm4,%xmm7 4009be: 66 0f 28 ec movapd %xmm4,%xmm5 4009c2: 66 44 0f 28 dc movapd %xmm4,%xmm11 4009c7: 49 8d 4c d6 08 lea 0x8(%r14,%rdx,8),%rcx 4009cc: 31 d2 xor %edx,%edx 4009ce: 45 31 f6 xor %r14d,%r14d 4009d1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 4009d8: f2 44 0f 10 24 16 movsd (%rsi,%rdx,1),%xmm12 4009de: 41 83 c6 01 add $0x1,%r14d 4009e2: f2 0f 10 31 movsd (%rcx),%xmm6 4009e6: 66 44 0f 16 64 16 08 movhpd 0x8(%rsi,%rdx,1),%xmm12 4009ed: f2 41 0f 10 04 17 movsd (%r15,%rdx,1),%xmm0 4009f3: 66 0f 16 71 08 movhpd 0x8(%rcx),%xmm6 4009f8: 66 41 0f 28 dc movapd %xmm12,%xmm3 4009fd: f2 44 0f 10 61 10 movsd 0x10(%rcx),%xmm12 400a03: 66 0f 28 ce movapd %xmm6,%xmm1 400a07: 66 41 0f 16 44 17 08 movhpd 0x8(%r15,%rdx,1),%xmm0 400a0e: 66 44 0f 16 61 18 movhpd 0x18(%rcx),%xmm12 400a14: f2 0f 10 33 movsd (%rbx),%xmm6 400a18: 66 0f 28 d0 movapd %xmm0,%xmm2 400a1c: 48 83 c2 10 add $0x10,%rdx 400a20: 66 41 0f 14 cc unpcklpd %xmm12,%xmm1 400a25: 66 0f 16 73 08 movhpd 0x8(%rbx),%xmm6 400a2a: f2 44 0f 10 63 10 movsd 0x10(%rbx),%xmm12 400a30: 48 83 c1 20 add $0x20,%rcx 400a34: 66 0f 28 c6 movapd %xmm6,%xmm0 400a38: 66 44 0f 16 63 18 movhpd 0x18(%rbx),%xmm12 400a3e: 66 0f 28 f1 movapd %xmm1,%xmm6 400a42: 66 0f 59 ca mulpd %xmm2,%xmm1 400a46: 48 83 c3 20 add $0x20,%rbx 400a4a: 41 39 ee cmp %ebp,%r14d 400a4d: 66 41 0f 14 c4 unpcklpd %xmm12,%xmm0 400a52: 66 0f 59 f3 mulpd %xmm3,%xmm6 400a56: 66 0f 59 d8 mulpd %xmm0,%xmm3 400a5a: 66 0f 58 f9 addpd %xmm1,%xmm7 400a5e: 66 0f 59 c2 mulpd %xmm2,%xmm0 400a62: 66 44 0f 58 de addpd %xmm6,%xmm11 400a67: 66 0f 58 eb addpd %xmm3,%xmm5 400a6b: 66 0f 58 e0 addpd %xmm0,%xmm4 400a6f: 0f 82 63 ff ff ff jb 4009d8 <integrate_gf_npbc_+0x2b8> 400a75: 66 0f 28 c4 movapd %xmm4,%xmm0 400a79: 8b 54 24 a8 mov -0x58(%rsp),%edx 400a7d: 66 44 0f 28 e7 movapd %xmm7,%xmm12 400a82: 39 54 24 ac cmp %edx,-0x54(%rsp) 400a86: 66 0f 15 c0 unpckhpd %xmm0,%xmm0 400a8a: 8b 4c 24 dc mov -0x24(%rsp),%ecx 400a8e: 66 45 0f 15 e4 unpckhpd %xmm12,%xmm12 400a93: 66 0f 28 f0 movapd %xmm0,%xmm6 400a97: 66 0f 28 c5 movapd %xmm5,%xmm0 400a9b: f2 0f 58 f4 addsd %xmm4,%xmm6 400a9f: 66 41 0f 28 e4 movapd %xmm12,%xmm4 400aa4: 66 0f 15 c0 unpckhpd %xmm0,%xmm0 400aa8: 66 45 0f 28 e3 movapd %xmm11,%xmm12 400aad: f2 0f 58 e7 addsd %xmm7,%xmm4 400ab1: 66 45 0f 15 e4 unpckhpd %xmm12,%xmm12 400ab6: 66 0f 28 f8 movapd %xmm0,%xmm7 400aba: f2 0f 58 fd addsd %xmm5,%xmm7 400abe: 66 41 0f 28 ec movapd %xmm12,%xmm5 400ac3: f2 41 0f 58 eb addsd %xmm11,%xmm5 400ac8: 0f 84 90 00 00 00 je 400b5e <integrate_gf_npbc_+0x43e> 400ace: 48 63 d1 movslq %ecx,%rdx 400ad1: 4c 8b 7c 24 b0 mov -0x50(%rsp),%r15 400ad6: 4a 8d 1c 0a lea (%rdx,%r9,1),%rbx 400ada: 4d 8d 34 dc lea (%r12,%rbx,8),%r14 400ade: 4a 8d 1c 12 lea (%rdx,%r10,1),%rbx 400ae2: 49 8d 54 57 01 lea 0x1(%r15,%rdx,2),%rdx 400ae7: 4c 8b 7c 24 a0 mov -0x60(%rsp),%r15 400aec: 49 8d 1c dc lea (%r12,%rbx,8),%rbx 400af0: 49 8d 14 d7 lea (%r15,%rdx,8),%rdx 400af4: 44 8b 7c 24 98 mov -0x68(%rsp),%r15d 400af9: 41 29 cf sub %ecx,%r15d 400afc: 31 c9 xor %ecx,%ecx 400afe: 4e 8d 3c fd 08 00 00 lea 0x8(,%r15,8),%r15 400b05: 00 400b06: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 400b0d: 00 00 00 400b10: f2 41 0f 10 1e movsd (%r14),%xmm3 400b15: 48 83 c1 08 add $0x8,%rcx 400b19: f2 0f 10 0b movsd (%rbx),%xmm1 400b1d: 49 83 c6 08 add $0x8,%r14 400b21: f2 0f 10 12 movsd (%rdx),%xmm2 400b25: 48 83 c3 08 add $0x8,%rbx 400b29: f2 0f 10 42 08 movsd 0x8(%rdx),%xmm0 400b2e: 48 83 c2 10 add $0x10,%rdx 400b32: 66 44 0f 28 db movapd %xmm3,%xmm11 400b37: 4c 39 f9 cmp %r15,%rcx 400b3a: f2 0f 59 d8 mulsd %xmm0,%xmm3 400b3e: f2 44 0f 59 da mulsd %xmm2,%xmm11 400b43: f2 0f 59 c1 mulsd %xmm1,%xmm0 400b47: f2 0f 59 d1 mulsd %xmm1,%xmm2 400b4b: f2 0f 58 fb addsd %xmm3,%xmm7 400b4f: f2 41 0f 58 eb addsd %xmm11,%xmm5 400b54: f2 0f 58 f0 addsd %xmm0,%xmm6 400b58: f2 0f 58 e2 addsd %xmm2,%xmm4 400b5c: 75 b2 jne 400b10 <integrate_gf_npbc_+0x3f0> 400b5e: f2 0f 10 4f 08 movsd 0x8(%rdi),%xmm1 400b63: f2 0f 10 57 18 movsd 0x18(%rdi),%xmm2 400b68: 66 0f 28 dc movapd %xmm4,%xmm3 400b6c: f2 0f 10 47 10 movsd 0x10(%rdi),%xmm0 400b71: f2 0f 59 da mulsd %xmm2,%xmm3 400b75: f2 0f 59 c5 mulsd %xmm5,%xmm0 400b79: f2 0f 59 67 20 mulsd 0x20(%rdi),%xmm4 400b7e: f2 0f 59 e9 mulsd %xmm1,%xmm5 400b82: f2 0f 59 f9 mulsd %xmm1,%xmm7 400b86: f2 0f 59 f2 mulsd %xmm2,%xmm6 400b8a: f2 41 0f 10 4d 00 movsd 0x0(%r13),%xmm1 400b90: f2 0f 58 eb addsd %xmm3,%xmm5 400b94: f2 0f 58 c4 addsd %xmm4,%xmm0 400b98: f2 0f 58 fe addsd %xmm6,%xmm7 400b9c: f2 41 0f 59 6d 08 mulsd 0x8(%r13),%xmm5 400ba2: f2 0f 59 c1 mulsd %xmm1,%xmm0 400ba6: f2 0f 59 f9 mulsd %xmm1,%xmm7 400baa: f2 44 0f 58 d7 addsd %xmm7,%xmm10 400baf: 48 83 c7 20 add $0x20,%rdi 400bb3: 48 03 74 24 88 add -0x78(%rsp),%rsi 400bb8: f2 44 0f 58 c8 addsd %xmm0,%xmm9 400bbd: 4c 03 5c 24 88 add -0x78(%rsp),%r11 400bc2: 4c 03 4c 24 90 add -0x70(%rsp),%r9 400bc7: f2 44 0f 58 c5 addsd %xmm5,%xmm8 400bcc: 4c 03 54 24 90 add -0x70(%rsp),%r10 400bd1: 45 85 c0 test %r8d,%r8d 400bd4: f2 44 0f 11 10 movsd %xmm10,(%rax) 400bd9: f2 44 0f 11 48 08 movsd %xmm9,0x8(%rax) 400bdf: f2 44 0f 11 40 10 movsd %xmm8,0x10(%rax) 400be5: 74 09 je 400bf0 <integrate_gf_npbc_+0x4d0> 400be7: 41 83 e8 01 sub $0x1,%r8d 400beb: e9 80 fd ff ff jmpq 400970 <integrate_gf_npbc_+0x250> 400bf0: 8b 54 24 44 mov 0x44(%rsp),%edx 400bf4: 3b 54 24 0c cmp 0xc(%rsp),%edx 400bf8: 0f 84 f5 00 00 00 je 400cf3 <integrate_gf_npbc_+0x5d3> 400bfe: 48 8b 94 24 a0 00 00 mov 0xa0(%rsp),%rdx 400c05: 00 400c06: 48 8b 9c 24 a8 00 00 mov 0xa8(%rsp),%rbx 400c0d: 00 400c0e: 66 0f 57 c0 xorpd %xmm0,%xmm0 400c12: 8b 0a mov (%rdx),%ecx 400c14: 8b 33 mov (%rbx),%esi 400c16: 66 0f 28 d0 movapd %xmm0,%xmm2 400c1a: 66 0f 28 d8 movapd %xmm0,%xmm3 400c1e: 39 f1 cmp %esi,%ecx 400c20: 0f 8f b1 00 00 00 jg 400cd7 <integrate_gf_npbc_+0x5b7> 400c26: 48 8b 5c 24 f8 mov -0x8(%rsp),%rbx 400c2b: 48 8b 7c 24 b0 mov -0x50(%rsp),%rdi 400c30: 48 63 d1 movslq %ecx,%rdx 400c33: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14 400c38: 66 0f 57 c9 xorpd %xmm1,%xmm1 400c3c: 29 ce sub %ecx,%esi 400c3e: 31 c9 xor %ecx,%ecx 400c40: 48 8d 1c 1a lea (%rdx,%rbx,1),%rbx 400c44: 48 8d 54 57 01 lea 0x1(%rdi,%rdx,2),%rdx 400c49: 48 8d 34 f5 08 00 00 lea 0x8(,%rsi,8),%rsi 400c50: 00 400c51: 66 0f 28 d1 movapd %xmm1,%xmm2 400c55: 49 8d 1c dc lea (%r12,%rbx,8),%rbx 400c59: 49 8d 14 d6 lea (%r14,%rdx,8),%rdx 400c5d: 0f 1f 00 nopl (%rax) 400c60: f2 0f 10 03 movsd (%rbx),%xmm0 400c64: 48 83 c1 08 add $0x8,%rcx 400c68: f2 0f 10 1a movsd (%rdx),%xmm3 400c6c: 48 83 c3 08 add $0x8,%rbx 400c70: f2 0f 59 d8 mulsd %xmm0,%xmm3 400c74: f2 0f 59 42 08 mulsd 0x8(%rdx),%xmm0 400c79: 48 83 c2 10 add $0x10,%rdx 400c7d: 48 39 f1 cmp %rsi,%rcx 400c80: f2 0f 58 cb addsd %xmm3,%xmm1 400c84: f2 0f 58 d0 addsd %xmm0,%xmm2 400c88: 75 d6 jne 400c60 <integrate_gf_npbc_+0x540> 400c8a: 48 8b 54 24 20 mov 0x20(%rsp),%rdx 400c8f: 4c 8b 7c 24 48 mov 0x48(%rsp),%r15 400c94: f2 41 0f 10 65 00 movsd 0x0(%r13),%xmm4 400c9a: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx 400c9f: 48 8b 5c 24 10 mov 0x10(%rsp),%rbx 400ca4: 48 8b 74 24 20 mov 0x20(%rsp),%rsi 400ca9: f2 42 0f 10 04 fa movsd (%rdx,%r15,8),%xmm0 400caf: 66 0f 28 d8 movapd %xmm0,%xmm3 400cb3: 48 8d 54 4b 02 lea 0x2(%rbx,%rcx,2),%rdx 400cb8: f2 41 0f 59 45 08 mulsd 0x8(%r13),%xmm0 400cbe: f2 0f 59 dc mulsd %xmm4,%xmm3 400cc2: f2 0f 59 da mulsd %xmm2,%xmm3 400cc6: f2 0f 10 14 d6 movsd (%rsi,%rdx,8),%xmm2 400ccb: f2 0f 59 c1 mulsd %xmm1,%xmm0 400ccf: f2 0f 59 d4 mulsd %xmm4,%xmm2 400cd3: f2 0f 59 d1 mulsd %xmm1,%xmm2 400cd7: f2 0f 58 18 addsd (%rax),%xmm3 400cdb: f2 0f 58 50 08 addsd 0x8(%rax),%xmm2 400ce0: f2 0f 58 40 10 addsd 0x10(%rax),%xmm0 400ce5: f2 0f 11 18 movsd %xmm3,(%rax) 400ce9: f2 0f 11 50 08 movsd %xmm2,0x8(%rax) 400cee: f2 0f 11 40 10 movsd %xmm0,0x10(%rax) 400cf3: 48 8b 7c 24 c0 mov -0x40(%rsp),%rdi 400cf8: 49 83 c5 10 add $0x10,%r13 400cfc: 48 01 7c 24 f8 add %rdi,-0x8(%rsp) 400d01: 48 01 7c 24 e0 add %rdi,-0x20(%rsp) 400d06: 44 8b 74 24 08 mov 0x8(%rsp),%r14d 400d0b: 48 01 7c 24 e8 add %rdi,-0x18(%rsp) 400d10: 44 39 74 24 cc cmp %r14d,-0x34(%rsp) 400d15: 74 11 je 400d28 <integrate_gf_npbc_+0x608> 400d17: 83 44 24 cc 01 addl $0x1,-0x34(%rsp) 400d1c: e9 6f fb ff ff jmpq 400890 <integrate_gf_npbc_+0x170> 400d21: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 400d28: 48 83 c4 50 add $0x50,%rsp 400d2c: 5b pop %rbx 400d2d: 5d pop %rbp 400d2e: 41 5c pop %r12 400d30: 41 5d pop %r13 400d32: 41 5e pop %r14 400d34: 41 5f pop %r15 400d36: c3 retq 400d37: 66 0f 57 e4 xorpd %xmm4,%xmm4 400d3b: 8b 4c 24 9c mov -0x64(%rsp),%ecx 400d3f: 66 0f 28 ec movapd %xmm4,%xmm5 400d43: 66 0f 28 f4 movapd %xmm4,%xmm6 400d47: 66 0f 28 fc movapd %xmm4,%xmm7 400d4b: e9 7e fd ff ff jmpq 400ace <integrate_gf_npbc_+0x3ae>
This seems to me like a bug in gcc. From the following analysis (start reading at 0x400a38), the value loaded from memory is never used -- xmm12 is completely overwritten by subsequent instructions, either in the post-loop block, or in the first instruction of the next iteration. ==12860== Invalid read of size 8 ==12860== at 0x400A38: integrate_gf_npbc_ # def xmm12 (low half loaded, high half zeroed) 4009d8: f2 44 0f 10 24 16 movsd (%rsi,%rdx,1),%xmm12 4009de: 41 83 c6 01 add $0x1,%r14d 4009e2: f2 0f 10 31 movsd (%rcx),%xmm6 4009e6: 66 44 0f 16 64 16 08 movhpd 0x8(%rsi,%rdx,1),%xmm12 4009ed: f2 41 0f 10 04 17 movsd (%r15,%rdx,1),%xmm0 4009f3: 66 0f 16 71 08 movhpd 0x8(%rcx),%xmm6 4009f8: 66 41 0f 28 dc movapd %xmm12,%xmm3 4009fd: f2 44 0f 10 61 10 movsd 0x10(%rcx),%xmm12 400a03: 66 0f 28 ce movapd %xmm6,%xmm1 400a07: 66 41 0f 16 44 17 08 movhpd 0x8(%r15,%rdx,1),%xmm0 400a0e: 66 44 0f 16 61 18 movhpd 0x18(%rcx),%xmm12 400a14: f2 0f 10 33 movsd (%rbx),%xmm6 400a18: 66 0f 28 d0 movapd %xmm0,%xmm2 400a1c: 48 83 c2 10 add $0x10,%rdx 400a20: 66 41 0f 14 cc unpcklpd %xmm12,%xmm1 400a25: 66 0f 16 73 08 movhpd 0x8(%rbx),%xmm6 400a2a: f2 44 0f 10 63 10 movsd 0x10(%rbx),%xmm12 400a30: 48 83 c1 20 add $0x20,%rcx 400a34: 66 0f 28 c6 movapd %xmm6,%xmm0 # load high half xmm12 (error reported here). low half unchanged. 400a38: 66 44 0f 16 63 18 movhpd 0x18(%rbx),%xmm12 400a3e: 66 0f 28 f1 movapd %xmm1,%xmm6 400a42: 66 0f 59 ca mulpd %xmm2,%xmm1 400a46: 48 83 c3 20 add $0x20,%rbx 400a4a: 41 39 ee cmp %ebp,%r14d # reads low half xmm12 only 400a4d: 66 41 0f 14 c4 unpcklpd %xmm12,%xmm0 400a52: 66 0f 59 f3 mulpd %xmm3,%xmm6 400a56: 66 0f 59 d8 mulpd %xmm0,%xmm3 400a5a: 66 0f 58 f9 addpd %xmm1,%xmm7 400a5e: 66 0f 59 c2 mulpd %xmm2,%xmm0 400a62: 66 44 0f 58 de addpd %xmm6,%xmm11 400a67: 66 0f 58 eb addpd %xmm3,%xmm5 400a6b: 66 0f 58 e0 addpd %xmm0,%xmm4 400a6f: 0f 82 63 ff ff ff jb 4009d8 # (loop head) 400a75: 66 0f 28 c4 movapd %xmm4,%xmm0 400a79: 8b 54 24 a8 mov -0x58(%rsp),%edx # def xmm12 (overwrite both halves) 400a7d: 66 44 0f 28 e7 movapd %xmm7,%xmm12
Similar testcase is gcc's own libcpp/lex.c optimization, which also can access a few bytes after malloced area, as long as at least one byte in the value read is from within the malloced area. See search_line_* routines in lex.c, not just SSE4.2/SSE2, but also even the generic C version actually does this. I guess valgrind could mark somehow the extra bytes as undefined content and propagate it through following arithmetic instructions, complain only if some conditional jump was made solely on the undefined bits or if the undefined bits were stored somewhere (or similar heuristics).
(In reply to comment #5) > This seems to me like a bug in gcc. Unfortunately, I'm an asm novice, so I can't tell. I see Jakub is on the CC as well, so maybe he can judge? Alternatively, I can reopen http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522 and refer here?
(In reply to comment #6) > Similar testcase is gcc's own libcpp/lex.c optimization, which also can access > a few bytes after malloced area, as long as at least one byte in the value read > is from within the malloced area. Those loops are (effectively) vectorised while loops, in which you use standard carry-chain propagation tricks to ensure that the stopping condition for the loop does not rely on the data from beyond the malloced area. It is not possible to vectorise them without such over-reading. By contrast, Joost's loop (and anything gcc can vectorise) are countable loops: the trip count is known (at run time) before the loop begins. It is always possible to vectorise such a loop without generating memory over reads, by having a vector loop to do (trip_count / vector_width) iterations, and a scalar fixup loop to do the final (trip_count % vector_width) iterations. > I guess valgrind could mark somehow the extra bytes as undefined content and > propagate it through following arithmetic instructions, complain only if some > conditional jump was made solely on the undefined bits or if the undefined bits > were stored somewhere (or similar heuristics). Well, maybe .. but Memcheck is too slow already. I don't want to junk it up with expensive and complicated heuristics that are irrelevant for 99.9% of the loads it will encounter. If you can show me some way to identify just the loads that need special treatment, then maybe. I don't see how to identify them, though.
Another simple testcase: https://bugzilla.redhat.com/show_bug.cgi?id=678518 I don't think 99% above is the right figure, at least with recent gcc generated code these false positives are just way too common. We disable a bunch of them in glibc through a suppression file or overloading the strops implementations, but when gcc inlines those there is no way to get rid of the false positives. Can't valgrind just start tracking in more details whether the bytes are actually used or not when memcheck sees a suspect read (in most cases just an aligned read where at least the first byte is still in the allocated region and perhaps some further ones aren't)? Force then retranslation of the bb it was used in or something similar?
(In reply to comment #9) > I don't think 99% above is the right figure, at least with recent > gcc generated What version of gcc?
The #include <stdlib.h> #include <string.h> __attribute__((noinline)) void foo (void *p) { memcpy (p, "0123456789abcd", 15); } int main (void) { void *p = malloc (15); foo (p); return strlen (p) - 14; } testcase where strlen does this is expanded that way with GCC 4.6 (currently used e.g. in Fedora 15) with default options, but e.g. 4.5 or even earlier versions expand this the same way with -O2 -minline-all-stringops.
I can see this problem isn't going to go away (alas); and we are seeing similar things on icc generated code. I'll look into it, but that won't happen for at least a couple of weeks.
Isn't this exactly the problem that "--partial-loads-ok" is meant to address? (cf. bug 294285) http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok
Could this bug be the same issue?: bug 301922
Hi all, I think I'm also seeing false positives because of vectorization, that unfortunately decreases the usefulness of valgrind. Below is a minimal working example that reproduces problems with std::string. The code is basically extracted from a library I was using (casacore 1.5) and in my software it generates a lot of incorrect "invalid read"s, although the library seems to be valid (although inherriting from string would not be my preferred solution). I hope this example is of use for evaluating the problem further. #include <string> #include <iostream> #include <cstring> #include <malloc.h> class StringAlt : public std::string { public: StringAlt(const char *c) : std::string(c) { } void operator=(const char *c) { std::string::operator=(c); } }; typedef StringAlt StringImp; //typedef std::string StringImp; //<-- replacing prev with this also solves issue int main(int argc, char *argv[]) { const char *a1 = "blaaa"; char *a2 = strdup(a1); a2[2] = 0; StringImp s(a1); std::cout << "Assign A2\n"; s = a2; std::cout << s << '\n'; std::cout << "Assign A1\n"; s = a1; std::cout << s << '\n'; char *a3 = strdup(s.c_str()); std::cout << "Assign A3\n"; s = a3; std::cout << s << '\n'; free(a2); free(a3); } Compiled with g++ Debian 4.7.1-2, with "-O2" or "-O3" results in the error below. With "-O0", it works fine. Changing the order of statements can also cause the error to disappear, which makes it very hard to debug. Output: Assign A2 bl Assign A1 blaaa Assign A3 ==20872== Invalid read of size 4 ==20872== at 0x400C5C: main (in /home/anoko/projects/test/test) ==20872== Address 0x59550f4 is 4 bytes inside a block of size 6 alloc'd ==20872== at 0x4C28BED: malloc (vg_replace_malloc.c:263) ==20872== by 0x564D911: strdup (strdup.c:43) ==20872== by 0x400C46: main (in /home/anoko/projects/test/test) ==20872== blaaa
(In reply to comment #15) Try to rebuild the library with -fno-builtin-strdup, chances are it will make valgrind working again.
-fno-builtin-strdup does indeed get rid of the valgrind message.