Bug 444571 - ppc64le dlclose_leak fails (when lxsibzx is used)
Summary: ppc64le dlclose_leak fails (when lxsibzx is used)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-28 22:59 UTC by Mark Wielaard
Modified: 2021-11-01 17:53 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Example binary b (76.98 KB, application/octet-stream)
2021-10-28 23:01 UTC, Mark Wielaard
Details
Fix for the lxsibzx and lxsihzx instrutions (2.10 KB, patch)
2021-10-29 21:54 UTC, Carl Love
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2021-10-28 22:59:00 UTC
on ppc64le memcheck/tests/linux/dlclose_leak fails:

--- dlclose_leak.stderr.exp	2021-10-28 17:52:34.597615502 -0400
+++ dlclose_leak.stderr.out	2021-10-28 18:27:39.453343651 -0400
@@ -2,13 +2,6 @@
    at 0x........: jmp_on_uninit (dlclose_leak_so.c:10)
    by 0x........: main (dlclose_leak.c:29)
 
-Invalid read of size 1
-   at 0x........: main (dlclose_leak.c:32)
- Address 0x........ is 1 bytes before a block of size 1 alloc'd
-   at 0x........: malloc (vg_replace_malloc.c:...)
-   by 0x........: alloc_1_byte (dlclose_leak_so.c:20)
-   by 0x........: main (dlclose_leak.c:30)
-
 Conditional jump or move depends on uninitialised value(s)
    at 0x........: jmp_on_uninit (dlclose_leak_so.c:10)
    by 0x........: main (dlclose_leak.c:29)

It is missing the invalid read at dlclose_leak.c:32

32	        x = memToLeak[-1];

This can also be seen with this much simpler program:

# cat b.c 
#include <malloc.h>

int main ()
{
  char *a = malloc (1);
  char x = a[-1];
}

gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2)

# gcc -g -o b b.c 

# valgrind -q ./b

(nothing)

On other arches this shows:

==102542== Invalid read of size 1
==102542==    at 0x4005CC: main (b.c:6)
==102542==  Address 0x4a4303f is 1 bytes before a block of size 1 alloc'd
==102542==    at 0x48680D8: malloc (vg_replace_malloc.c:381)
==102542==    by 0x4005C3: main (b.c:5)

On ppc64le the disassembly looks like:

Dump of assembler code for function main:
4	{
   0x000000001000064c <+0>:	lis     r2,4098
   0x0000000010000650 <+4>:	addi    r2,r2,32512
   0x0000000010000654 <+8>:	mflr    r0
   0x0000000010000658 <+12>:	std     r0,16(r1)
   0x000000001000065c <+16>:	std     r31,-8(r1)
   0x0000000010000660 <+20>:	stdu    r1,-64(r1)
   0x0000000010000664 <+24>:	mr      r31,r1

5	  char *a = malloc (1);
   0x0000000010000668 <+28>:	li      r3,1
   0x000000001000066c <+32>:	bl      0x10000460 <0000002a.plt_call.malloc@@GLIBC_2.17>
   0x0000000010000670 <+36>:	ld      r2,24(r1)
   0x0000000010000674 <+40>:	mr      r9,r3
   0x0000000010000678 <+44>:	std     r9,32(r31)

6	  char x = a[-1];
=> 0x000000001000067c <+48>:	ld      r9,32(r31)
   0x0000000010000680 <+52>:	addi    r9,r9,-1
   0x0000000010000684 <+56>:	lxsibzx vs0,0,r9
   0x0000000010000688 <+60>:	addi    r9,r31,40
   0x000000001000068c <+64>:	stxsibx vs0,0,r9
   0x0000000010000690 <+68>:	li      r9,0

7	}
   0x0000000010000694 <+72>:	mr      r3,r9
   0x0000000010000698 <+76>:	addi    r1,r31,64
   0x000000001000069c <+80>:	ld      r0,16(r1)
   0x00000000100006a0 <+84>:	mtlr    r0
   0x00000000100006a4 <+88>:	ld      r31,-8(r1)
   0x00000000100006a8 <+92>:	blr
   0x00000000100006ac <+96>:	.long 0x0
   0x00000000100006b0 <+100>:	.long 0x1000000
   0x00000000100006b4 <+104>:	.long 0x1000180
Comment 1 Mark Wielaard 2021-10-28 23:01:15 UTC
Created attachment 142985 [details]
Example binary b

This is the ppc64le binary for:

#include <malloc.h>

int main ()
{
  char *a = malloc (1);
  char x = a[-1];
}
Comment 2 Mark Wielaard 2021-10-29 08:08:45 UTC
I wonder if this is also related to the following fail in helgrind/tests/free_is_write:

-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 5 errors from 1 contexts (suppressed: 0 from 0)

The error is detected 5 times in the same spot and the assembly shows a lxsibzx/
stxsibx pair used:

33	
34	    /* Write, which isn't coordinated with the free ==> a race
35	       should be reported. */
36	    char c = s_mem[5];
   0x0000000010000a04 <+236>:	nop
   0x0000000010000a08 <+240>:	addi    r9,r2,-32408
   0x0000000010000a0c <+244>:	ld      r9,0(r9)
   0x0000000010000a10 <+248>:	addi    r9,r9,5
   0x0000000010000a14 <+252>:	lxsibzx vs0,0,r9
   0x0000000010000a18 <+256>:	addi    r9,r31,36
   0x0000000010000a1c <+260>:	stxsibx vs0,0,r9

37	    __asm__ __volatile__("" : : "r"((long)c) );
   0x0000000010000a20 <+264>:	lbz     r9,36(r31)
Comment 3 Mark Wielaard 2021-10-29 08:12:01 UTC
There is also the following failure in memcheck/tests/badrw

--- badrw.stderr.exp	2021-10-28 17:52:34.577616207 -0400
+++ badrw.stderr.out	2021-10-28 17:57:16.957629621 -0400
@@ -10,24 +10,12 @@
    at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badrw.c:5)
 
-Invalid read of size 2
-   at 0x........: main (badrw.c:22)
- Address 0x........ is 4 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_replace_malloc.c:...)
-   by 0x........: main (badrw.c:5)
-
 Invalid write of size 2
    at 0x........: main (badrw.c:23)
  Address 0x........ is 4 bytes before a block of size 10 alloc'd
    at 0x........: malloc (vg_replace_malloc.c:...)
    by 0x........: main (badrw.c:5)
 
-Invalid read of size 1
-   at 0x........: main (badrw.c:25)
- Address 0x........ is 1 bytes before a block of size 10 alloc'd
-   at 0x........: malloc (vg_replace_malloc.c:...)
-   by 0x........: main (badrw.c:5)
-
 Invalid write of size 1
    at 0x........: main (badrw.c:26)
  Address 0x........ is 1 bytes before a block of size 10 alloc'd

Where the missing invalid read reports correspond with:

22	   y2 = *x2;
   0x00000000100006c0 <+116>:	ld      r9,48(r31)
   0x00000000100006c4 <+120>:	lxsihzx vs0,0,r9
   0x00000000100006c8 <+124>:	addi    r9,r31,68
   0x00000000100006cc <+128>:	stxsihx vs0,0,r9

and

25	   y1 = *x1;
   0x00000000100006e0 <+148>:	ld      r9,56(r31)
   0x00000000100006e4 <+152>:	lxsibzx vs0,0,r9
   0x00000000100006e8 <+156>:	addi    r9,r31,70
   0x00000000100006ec <+160>:	stxsibx vs0,0,r9
Comment 4 Carl Love 2021-10-29 18:40:37 UTC
Can you give me some info on your test environment?  I have run the regression tests (make regtest) on a Power 9 and a Power 10 system and don't see these errors.

I tried testing the current updstream code from git on several systems.

Power 9, Ubuntu 20.04 LE 

== 665 tests, 4 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdou\
tB failure, 2 post failures ==                                                  
gdbserver_tests/nlcontrolc               (stdoutB)                              
memcheck/tests/bug340392                 (stderr)                               
memcheck/tests/leak_cpp_interior         (stderr)                               
memcheck/tests/linux/rfcomm              (stderr)                               
memcheck/tests/linux/sys-execveat        (stderr)                               
massif/tests/new-cpp                     (post)                                 
massif/tests/overloaded-new              (post) 

Power 10, Ubuntu 21.04 

== 672 tests, 3 stderr failures, 1 stdout failure, 0 stderrB failures, 1 stdout\
B failure, 2 post failures ==
gdbserver_tests/nlcontrolc               (stdoutB)
memcheck/tests/bug340392                 (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/ppc64/power_ISA2_05       (stdout)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)


Power 10, Fedora 34 

memcheck/tests/bug340392                 (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/ppc64/power_ISA2_05       (stdout)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)
Comment 5 Mark Wielaard 2021-10-29 19:02:06 UTC
(In reply to Carl Love from comment #4)
> Can you give me some info on your test environment?  I have run the
> regression tests (make regtest) on a Power 9 and a Power 10 system and don't
> see these errors.

This is with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) on a POWER9 (CHRP IBM,8375-42A)

See also the attached binary for the generated object code.
Comment 6 Mark Wielaard 2021-10-29 19:20:51 UTC
(In reply to Mark Wielaard from comment #5)
> (In reply to Carl Love from comment #4)
> > Can you give me some info on your test environment?  I have run the
> > regression tests (make regtest) on a Power 9 and a Power 10 system and don't
> > see these errors.
> 
> This is with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2) on a POWER9 (CHRP
> IBM,8375-42A)
> 
> See also the attached binary for the generated object code.

CentOS 9 Stream might have builds of that gcc for ppc64le:
https://composes.stream.centos.org/development/latest-CentOS-Stream/compose/BaseOS/ppc64le/
Comment 7 Carl Love 2021-10-29 19:46:27 UTC
I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1) and glib 2.33.

When I try to run the binary b I get the message:

./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b)

I will look for a system with CentOS 9 system, but that is not a usual one for us to have around.
Comment 8 Mark Wielaard 2021-10-29 19:51:57 UTC
(In reply to Carl Love from comment #7)
> I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat
> 11.2.1-1) and glib 2.33.
> 
> When I try to run the binary b I get the message:
> 
> ./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b)
> 
> I will look for a system with CentOS 9 system, but that is not a usual one
> for us to have around.

Fedora 35 has glibc 2.34 (and maybe also a newer gcc):
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/AXGQCCE66LTSVJXWULKBT4DOMAWJM2BD/
Comment 9 Mark Wielaard 2021-10-29 19:56:12 UTC
(In reply to Mark Wielaard from comment #8)
> (In reply to Carl Love from comment #7)
> > I have access to a Fedora system with gcc (GCC) 11.2.1 20210728 (Red Hat
> > 11.2.1-1) and glib 2.33.
> > 
> > When I try to run the binary b I get the message:
> > 
> > ./b: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./b)
> > 
> > I will look for a system with CentOS 9 system, but that is not a usual one
> > for us to have around.
> 
> Fedora 35 has glibc 2.34 (and maybe also a newer gcc):
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/
> thread/AXGQCCE66LTSVJXWULKBT4DOMAWJM2BD/

I don't see the issues on the Fedora 35 build though:
https://kojipkgs.fedoraproject.org//packages/valgrind/3.18.1/1.fc35/data/logs/ppc64le/build.log

So that gcc might be configured differently so that it doesn't generate the lxsibzx instruction?
Comment 10 Carl Love 2021-10-29 20:46:09 UTC
I found a Power 10 system with RHEL 9 installed.  Looks like there is a compiler change that is now generating the lxsibzx and stxsibx instructions.  I am seeing the instructions being generated on this system.

Fedora 34 has 
gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)

dlclose_leak dump
....        x = memToLeak[-1];
    10000a90:   30 00 3f e9     ld      r9,48(r31)
    10000a94:   ff ff 29 89     lbz     r9,-1(r9)
    10000a98:   38 00 3f 99     stb     r9,56(r31)
    int i; for (i = 0; i < 2; ++i)


RHEL 9, gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2)

dlclose_leak dump
...
    x = memToLeak[-1];
    10000a94:   30 00 3f e9     ld      r9,48(r31)
    10000a98:   ff ff 29 39     addi    r9,r9,-1
    10000a9c:   1a 4e 00 7c     lxsibzx vs0,0,r9
    10000aa0:   38 00 3f 39     addi    r9,r31,56
    10000aa4:   1a 4f 00 7c     stxsibx vs0,0,r9
    int i; for (i = 0; i < 2; ++i)

The newer gcc is generating the instructions in question.

The test results are:
== 671 tests, 8 stderr failures, 1 stdout failure, 1 stderrB failure, 0 stdoutB\
 failures, 2 post failures ==
gdbserver_tests/hginfo                   (stderrB)
memcheck/tests/badrw                     (stderr)
memcheck/tests/bug340392                 (stderr)
memcheck/tests/linux/dlclose_leak-no-keep (stderr)
memcheck/tests/linux/dlclose_leak        (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/ppc64/power_ISA2_05       (stdout)
helgrind/tests/free_is_write             (stderr)
helgrind/tests/tls_threads               (stderr)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)

The test is failing.

I can now reproduce the bug.
Comment 11 Mark Wielaard 2021-10-29 20:54:00 UTC
yeah, the fedora 35 gcc generates:

6	  char x = a[-1];
   0x000000001000071c <+48>:	ld      r9,32(r31)
   0x0000000010000720 <+52>:	lbz     r9,-1(r9)
   0x0000000010000724 <+56>:	stb     r9,40(r31)
   0x0000000010000728 <+60>:	li      r9,0

which valgrind does catch:

==15687== Invalid read of size 1
==15687==    at 0x10000720: main (b.c:6)
==15687==  Address 0x436003f is 1 bytes before a block of size 1 alloc'd
==15687==    at 0x40A5344: malloc (vg_replace_malloc.c:380)
==15687==    by 0x1000070F: main (b.c:5)

I can replicate when using gcc -mcpu=power9 -g -o b b.c

6	  char x = a[-1];
   0x000000001000071c <+48>:	ld      r9,32(r31)
   0x0000000010000720 <+52>:	addi    r9,r9,-1
   0x0000000010000724 <+56>:	lxsibzx vs0,0,r9
   0x0000000010000728 <+60>:	addi    r9,r31,40
   0x000000001000072c <+64>:	stxsibx vs0,0,r9
   0x0000000010000730 <+68>:	li      r9,0

So I guess that the difference is the default -mcpu setting between gcc versions
Comment 12 Carl Love 2021-10-29 21:54:42 UTC
Created attachment 143017 [details]
Fix for the lxsibzx and lxsihzx instrutions

The attached patch fixes the issues with the memcheck/tests/linux/dlclose_leak test.  The issue only occurs on newer compiler with the -mcpu=power9 option which generates the lxsibzx and lxsihzx instructions.
Comment 13 Carl Love 2021-10-29 21:56:10 UTC
Mark, give the patch a spin and see if it works for you.  The test results on my machine show the test is fixed.  

== 671 tests, 4 stderr failures, 1 stdout failure, 1 stderrB failure, 0 stdoutB f\
ailures, 2 post failures ==
gdbserver_tests/hginfo                   (stderrB)
memcheck/tests/bug340392                 (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/ppc64/power_ISA2_05       (stdout)
helgrind/tests/tls_threads               (stderr)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)
Comment 14 Carl Love 2021-10-29 21:59:44 UTC
While fixing the lxsibzx instruction, I noticed the same issue with lxsihzx so fixed it at the same time.  Looking at comment #3 I see you called out the lxsihzx instruction in the badrw test.  Looks like the patch fixed the badrw test as well.
Comment 15 Mark Wielaard 2021-11-01 16:30:39 UTC
(In reply to Carl Love from comment #13)
> Mark, give the patch a spin and see if it works for you.

You patch fixes the following 4 tests:
-memcheck/tests/badrw                     (stderr)
-memcheck/tests/linux/dlclose_leak-no-keep (stderr)
-memcheck/tests/linux/dlclose_leak        (stderr)
-helgrind/tests/free_is_write             (stderr)

And introduces no regresssions for me
Comment 16 Carl Love 2021-11-01 17:53:01 UTC
The patch has been committed.  

commit 6e08ee95f7f1b1c3fd434fa380cc5b2cc3e3f7c7 (HEAD -> master, origin/master, origin/HEAD)
Author: Carl Love <cel@us.ibm.com>
Date:   Fri Oct 29 16:30:33 2021 -0500

    Bug 444571 - PPC, fix the lxsibzx and lxsihzx so they only load their respective sized data.
    
    The lxsibzx was doing a 64-bit load.  The result was initializing
    additional bytes in the register that should not have been initialized.
    The memcheck/tests/linux/dlclose_leak test detected the issue.  The
    code generation uses lxsibzx and stxsibx with -mcpu=power9.  Previously
    the lbz and stb instructions were generated.
    
    The same issue was noted and fixed with the lxsihzx instruction.  The
    memcheck/tests/linux/badrw test now passes as well.
    
    https://bugs.kde.org/show_bug.cgi?id=444571