on ppc64le memcheck/tests/linux/dlclose_leak fails: --- dlclose_leak.stderr.exp 2021-10-28 17:52:34.597615502 -0400 +++ dlclose_leak.stderr.out 2021-10-28 18:27:39.453343651 -0400 @@ -2,13 +2,6 @@ at 0x........: jmp_on_uninit (dlclose_leak_so.c:10) by 0x........: main (dlclose_leak.c:29) -Invalid read of size 1 - at 0x........: main (dlclose_leak.c:32) - Address 0x........ is 1 bytes before a block of size 1 alloc'd - at 0x........: malloc (vg_replace_malloc.c:...) - by 0x........: alloc_1_byte (dlclose_leak_so.c:20) - by 0x........: main (dlclose_leak.c:30) - Conditional jump or move depends on uninitialised value(s) at 0x........: jmp_on_uninit (dlclose_leak_so.c:10) by 0x........: main (dlclose_leak.c:29) It is missing the invalid read at dlclose_leak.c:32 32 x = memToLeak[-1]; This can also be seen with this much simpler program: # cat b.c #include <malloc.h> int main () { char *a = malloc (1); char x = a[-1]; } gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) # gcc -g -o b b.c # valgrind -q ./b (nothing) On other arches this shows: ==102542== Invalid read of size 1 ==102542== at 0x4005CC: main (b.c:6) ==102542== Address 0x4a4303f is 1 bytes before a block of size 1 alloc'd ==102542== at 0x48680D8: malloc (vg_replace_malloc.c:381) ==102542== by 0x4005C3: main (b.c:5) On ppc64le the disassembly looks like: Dump of assembler code for function main: 4 { 0x000000001000064c <+0>: lis r2,4098 0x0000000010000650 <+4>: addi r2,r2,32512 0x0000000010000654 <+8>: mflr r0 0x0000000010000658 <+12>: std r0,16(r1) 0x000000001000065c <+16>: std r31,-8(r1) 0x0000000010000660 <+20>: stdu r1,-64(r1) 0x0000000010000664 <+24>: mr r31,r1 5 char *a = malloc (1); 0x0000000010000668 <+28>: li r3,1 0x000000001000066c <+32>: bl 0x10000460 <0000002a.plt_call.malloc@@GLIBC_2.17> 0x0000000010000670 <+36>: ld r2,24(r1) 0x0000000010000674 <+40>: mr r9,r3 0x0000000010000678 <+44>: std r9,32(r31) 6 char x = a[-1]; => 0x000000001000067c <+48>: ld r9,32(r31) 0x0000000010000680 <+52>: addi r9,r9,-1 0x0000000010000684 <+56>: lxsibzx vs0,0,r9 0x0000000010000688 <+60>: addi r9,r31,40 0x000000001000068c <+64>: stxsibx vs0,0,r9 0x0000000010000690 <+68>: li r9,0 7 } 0x0000000010000694 <+72>: mr r3,r9 0x0000000010000698 <+76>: addi r1,r31,64 0x000000001000069c <+80>: ld r0,16(r1) 0x00000000100006a0 <+84>: mtlr r0 0x00000000100006a4 <+88>: ld r31,-8(r1) 0x00000000100006a8 <+92>: blr 0x00000000100006ac <+96>: .long 0x0 0x00000000100006b0 <+100>: .long 0x1000000 0x00000000100006b4 <+104>: .long 0x1000180
Created attachment 142985 [details] Example binary b This is the ppc64le binary for: #include <malloc.h> int main () { char *a = malloc (1); char x = a[-1]; }
I wonder if this is also related to the following fail in helgrind/tests/free_is_write: -ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) +ERROR SUMMARY: 5 errors from 1 contexts (suppressed: 0 from 0) The error is detected 5 times in the same spot and the assembly shows a lxsibzx/ stxsibx pair used: 33 34 /* Write, which isn't coordinated with the free ==> a race 35 should be reported. */ 36 char c = s_mem[5]; 0x0000000010000a04 <+236>: nop 0x0000000010000a08 <+240>: addi r9,r2,-32408 0x0000000010000a0c <+244>: ld r9,0(r9) 0x0000000010000a10 <+248>: addi r9,r9,5 0x0000000010000a14 <+252>: lxsibzx vs0,0,r9 0x0000000010000a18 <+256>: addi r9,r31,36 0x0000000010000a1c <+260>: stxsibx vs0,0,r9 37 __asm__ __volatile__("" : : "r"((long)c) ); 0x0000000010000a20 <+264>: lbz r9,36(r31)
There is also the following failure in memcheck/tests/badrw --- badrw.stderr.exp 2021-10-28 17:52:34.577616207 -0400 +++ badrw.stderr.out 2021-10-28 17:57:16.957629621 -0400 @@ -10,24 +10,12 @@ at 0x........: malloc (vg_replace_malloc.c:...) by 0x........: main (badrw.c:5) -Invalid read of size 2 - at 0x........: main (badrw.c:22) - Address 0x........ is 4 bytes before a block of size 10 alloc'd - at 0x........: malloc (vg_replace_malloc.c:...) - by 0x........: main (badrw.c:5) - Invalid write of size 2 at 0x........: main (badrw.c:23) Address 0x........ is 4 bytes before a block of size 10 alloc'd at 0x........: malloc (vg_replace_malloc.c:...) by 0x........: main (badrw.c:5) -Invalid read of size 1 - at 0x........: main (badrw.c:25) - Address 0x........ is 1 bytes before a block of size 10 alloc'd - at 0x........: malloc (vg_replace_malloc.c:...) - by 0x........: main (badrw.c:5) - Invalid write of size 1 at 0x........: main (badrw.c:26) Address 0x........ is 1 bytes before a block of size 10 alloc'd Where the missing invalid read reports correspond with: 22 y2 = *x2; 0x00000000100006c0 <+116>: ld r9,48(r31) 0x00000000100006c4 <+120>: lxsihzx vs0,0,r9 0x00000000100006c8 <+124>: addi r9,r31,68 0x00000000100006cc <+128>: stxsihx vs0,0,r9 and 25 y1 = *x1; 0x00000000100006e0 <+148>: ld r9,56(r31) 0x00000000100006e4 <+152>: lxsibzx vs0,0,r9 0x00000000100006e8 <+156>: addi r9,r31,70 0x00000000100006ec <+160>: stxsibx vs0,0,r9
Can you give me some info on your test environment? I have run the regression tests (make regtest) on a Power 9 and a Power 10 system and don't see these errors. I tried testing the current updstream code from git on several systems. Power 9, Ubuntu 20.04 LE == 665 tests, 4 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdou\ tB failure, 2 post failures == gdbserver_tests/nlcontrolc (stdoutB) memcheck/tests/bug340392 (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) Power 10, Ubuntu 21.04 == 672 tests, 3 stderr failures, 1 stdout failure, 0 stderrB failures, 1 stdout\ B failure, 2 post failures == gdbserver_tests/nlcontrolc (stdoutB) memcheck/tests/bug340392 (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/ppc64/power_ISA2_05 (stdout) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) Power 10, Fedora 34 memcheck/tests/bug340392 (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/ppc64/power_ISA2_05 (stdout) massif/tests/new-cpp (post) massif/tests/overloaded-new (post)
(In reply to Carl Love from comment #4) > Can you give me some info on your test environment? I have run the > regression tests (make regtest) on a Power 9 and a Power 10 system and don't > see these errors. This is with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) on a POWER9 (CHRP IBM,8375-42A) See also the attached binary for the generated object code.
(In reply to Mark Wielaard from comment #5) > (In reply to Carl Love from comment #4) > > Can you give me some info on your test environment? I have run the > > regression tests (make regtest) on a Power 9 and a Power 10 system and don't > > see these errors. > > This is with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) on a POWER9 (CHRP > IBM,8375-42A) > > See also the attached binary for the generated object code. CentOS 9 Stream might have builds of that gcc for ppc64le: https://composes.stream.centos.org/development/latest-CentOS-Stream/compose/BaseOS/ppc64le/
I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1) and glib 2.33. When I try to run the binary b I get the message: ./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b) I will look for a system with CentOS 9 system, but that is not a usual one for us to have around.
(In reply to Carl Love from comment #7) > I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat > 11.2.1-1) and glib 2.33. > > When I try to run the binary b I get the message: > > ./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b) > > I will look for a system with CentOS 9 system, but that is not a usual one > for us to have around. Fedora 35 has glibc 2.34 (and maybe also a newer gcc): https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/AXGQCCE66LTSVJXWULKBT4DOMAWJM2BD/
(In reply to Mark Wielaard from comment #8) > (In reply to Carl Love from comment #7) > > I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat > > 11.2.1-1) and glib 2.33. > > > > When I try to run the binary b I get the message: > > > > ./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b) > > > > I will look for a system with CentOS 9 system, but that is not a usual one > > for us to have around. > > Fedora 35 has glibc 2.34 (and maybe also a newer gcc): > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/ > thread/AXGQCCE66LTSVJXWULKBT4DOMAWJM2BD/ I don't see the issues on the Fedora 35 build though: https://kojipkgs.fedoraproject.org//packages/valgrind/3.18.1/1.fc35/data/logs/ppc64le/build.log So that gcc might be configured differently so that it doesn't generate the lxsibzx instruction?
I found a Power 10 system with RHEL 9 installed. Looks like there is a compiler change that is now generating the lxsibzx and stxsibx instructions. I am seeing the instructions being generated on this system. Fedora 34 has gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1) dlclose_leak dump .... x = memToLeak[-1]; 10000a90: 30 00 3f e9 ld r9,48(r31) 10000a94: ff ff 29 89 lbz r9,-1(r9) 10000a98: 38 00 3f 99 stb r9,56(r31) int i; for (i = 0; i < 2; ++i) RHEL 9, gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) dlclose_leak dump ... x = memToLeak[-1]; 10000a94: 30 00 3f e9 ld r9,48(r31) 10000a98: ff ff 29 39 addi r9,r9,-1 10000a9c: 1a 4e 00 7c lxsibzx vs0,0,r9 10000aa0: 38 00 3f 39 addi r9,r31,56 10000aa4: 1a 4f 00 7c stxsibx vs0,0,r9 int i; for (i = 0; i < 2; ++i) The newer gcc is generating the instructions in question. The test results are: == 671 tests, 8 stderr failures, 1 stdout failure, 1 stderrB failure, 0 stdoutB\ failures, 2 post failures == gdbserver_tests/hginfo (stderrB) memcheck/tests/badrw (stderr) memcheck/tests/bug340392 (stderr) memcheck/tests/linux/dlclose_leak-no-keep (stderr) memcheck/tests/linux/dlclose_leak (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/ppc64/power_ISA2_05 (stdout) helgrind/tests/free_is_write (stderr) helgrind/tests/tls_threads (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) The test is failing. I can now reproduce the bug.
yeah, the fedora 35 gcc generates: 6 char x = a[-1]; 0x000000001000071c <+48>: ld r9,32(r31) 0x0000000010000720 <+52>: lbz r9,-1(r9) 0x0000000010000724 <+56>: stb r9,40(r31) 0x0000000010000728 <+60>: li r9,0 which valgrind does catch: ==15687== Invalid read of size 1 ==15687== at 0x10000720: main (b.c:6) ==15687== Address 0x436003f is 1 bytes before a block of size 1 alloc'd ==15687== at 0x40A5344: malloc (vg_replace_malloc.c:380) ==15687== by 0x1000070F: main (b.c:5) I can replicate when using gcc -mcpu=power9 -g -o b b.c 6 char x = a[-1]; 0x000000001000071c <+48>: ld r9,32(r31) 0x0000000010000720 <+52>: addi r9,r9,-1 0x0000000010000724 <+56>: lxsibzx vs0,0,r9 0x0000000010000728 <+60>: addi r9,r31,40 0x000000001000072c <+64>: stxsibx vs0,0,r9 0x0000000010000730 <+68>: li r9,0 So I guess that the difference is the default -mcpu setting between gcc versions
Created attachment 143017 [details] Fix for the lxsibzx and lxsihzx instrutions The attached patch fixes the issues with the memcheck/tests/linux/dlclose_leak test. The issue only occurs on newer compiler with the -mcpu=power9 option which generates the lxsibzx and lxsihzx instructions.
Mark, give the patch a spin and see if it works for you. The test results on my machine show the test is fixed. == 671 tests, 4 stderr failures, 1 stdout failure, 1 stderrB failure, 0 stdoutB f\ ailures, 2 post failures == gdbserver_tests/hginfo (stderrB) memcheck/tests/bug340392 (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/ppc64/power_ISA2_05 (stdout) helgrind/tests/tls_threads (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post)
While fixing the lxsibzx instruction, I noticed the same issue with lxsihzx so fixed it at the same time. Looking at comment #3 I see you called out the lxsihzx instruction in the badrw test. Looks like the patch fixed the badrw test as well.
(In reply to Carl Love from comment #13) > Mark, give the patch a spin and see if it works for you. You patch fixes the following 4 tests: -memcheck/tests/badrw (stderr) -memcheck/tests/linux/dlclose_leak-no-keep (stderr) -memcheck/tests/linux/dlclose_leak (stderr) -helgrind/tests/free_is_write (stderr) And introduces no regresssions for me
The patch has been committed. commit 6e08ee95f7f1b1c3fd434fa380cc5b2cc3e3f7c7 (HEAD -> master, origin/master, origin/HEAD) Author: Carl Love <cel@us.ibm.com> Date: Fri Oct 29 16:30:33 2021 -0500 Bug 444571 - PPC, fix the lxsibzx and lxsihzx so they only load their respective sized data. The lxsibzx was doing a 64-bit load. The result was initializing additional bytes in the register that should not have been initialized. The memcheck/tests/linux/dlclose_leak test detected the issue. The code generation uses lxsibzx and stxsibx with -mcpu=power9. Previously the lbz and stb instructions were generated. The same issue was noted and fixed with the lxsihzx instruction. The memcheck/tests/linux/badrw test now passes as well. https://bugs.kde.org/show_bug.cgi?id=444571