Bug 455826 - Running Valgrind memcheck on a live process without exiting it reports LDL but on graceful exit it does not.
Summary: Running Valgrind memcheck on a live process without exiting it reports LDL bu...
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.17.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-23 04:09 UTC by shapath
Modified: 2022-07-05 05:56 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description shapath 2022-06-23 04:09:53 UTC
SUMMARY

Hello Guys,

Not sure this is a bug but i would like to understand the behavior. I have gone through Valgrind references as well.

I am trying to monitor leaks using vgdb on a running process. using the below reference.
Reference:- https://valgrind.org/docs/manual/manual-core-adv.html

Step 1. Run valgrind with vgdb.
valgrind  --tool=memcheck --leak-check=full --show-reachable=yes  --vgdb=yes --undef-value-errors=yes --track-origins=yes --child-silent-after-fork=no --trace-children=no --error-exitcode=1 --log-file=/var/log/_valgrind_%p --xml-file=/var/log/_valgrind_xml_%p  <application>

Step2.
Run  gdb and attach to the process using:-

"target remote | vgdb" 

Step3.
Run "monitor leak_check full reachable any". 

After running this I see LDL (leak definitely lost.) in the report. But if I gracefully exit the program i do not see these leaks in the generated report.

Is it possible that "monitor " is getting called at a random state of the application and treats the pointers as LDL that are going to be freed at the shutdown as lost memory ?

Is there a possibility that the monitor tool is reporting them as false positives?
Or these are definite leaks that are somehow handled by some pointer arithmetic or something else.

Please point me to any reference if it is already available.

Thanks
Shapath




STEPS TO REPRODUCE
1. 
2. 
3. 

OBSERVED RESULT


EXPECTED RESULT


SOFTWARE/OS VERSIONS

Linux/KDE Plasma: 
(
Comment 1 Paul Floyd 2022-06-23 06:35:18 UTC
It's impossible to say without more details.

Does the application do any pointer munging?
Comment 2 Philippe Waroquiers 2022-06-23 18:30:03 UTC
Note that the leak search algorithm is scanning the memory starting from "root" memory zone (stacks, global variables, registers, ...).
During this scanning, any aligned piece of memory which happens to point at a block will be considered as a pointer.
So, for example, if an integer variable happens to have the same bit representation as the address of an allocated (but lost) block,
the leak search will not detect the lost block as a leak, because it has found a "pointer" to this block.

So, possibly, depending on what the process does before exit, it might create some bit patterns that look like a pointer.

The leak search algorithm might thus have false negative: some real leaks might not be detected.
I do not see how the leak search algorithm could create a false positive  lost block (ignoring the possibility that the algorithm
is buggy of course).

Note also that monitor leak_check is just launching the same leak search algorithms as used by client requests and used at exit.

As Paul said, more info (e.g. what does the leak stack trace look like ? Is such a leak report plausible when it is detected) might clarify.
Comment 3 shapath 2022-06-24 04:37:14 UTC
Thank you guys for the response !!!
The leak is reported in third-party code. Based on your responses and input I analyzed the code and wrote a similar sample program where i was able to hit this.

The interesting thing I found is it is related to packing. If I do not use packing I do not see this issue.


Here is the sample code:
======================

#include  <stdio.h>
#include  <stdlib.h>
#include <string.h>

#define YPACK __attribute__((packed))
//#define YPACK

typedef struct sample_t
{
    unsigned short        header;
    struct sample_t       *prev;
    struct sample_t       *next;
} YPACK MY_Node;


typedef  struct abc
{
   MY_Node node;
   char *name;
} YPACK xyz;

//Global

xyz *node;

void* my_malloc(char **t)
{
   {
      *t  = strdup("hello");
   }
}

void main()
{
   char *t;
   node = malloc(sizeof(xyz));
   my_malloc(&(node->name));

   while(1)
      sleep();
   //free(t);

   return;
}


Valgrind report:-
==============
(gdb) monitor leak_check full reachable any
==30830== 6 bytes in 1 blocks are definitely lost in loss record 1 of 2
==30830==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30830==    by 0x4EC3B89: strdup (in /usr/lib64/libc-2.17.so)
==30830==    by 0x4005D2: my_malloc (valgrind.c:29)
==30830==    by 0x400606: main (valgrind.c:37)
==30830==
==30830== 26 bytes in 1 blocks are still reachable in loss record 2 of 2
==30830==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30830==    by 0x4005EC: main (valgrind.c:36)
==30830==
==30830== LEAK SUMMARY:
==30830==    definitely lost: 6 bytes in 1 blocks
==30830==    indirectly lost: 0 bytes in 0 blocks
==30830==      possibly lost: 0 bytes in 0 blocks
==30830==    still reachable: 26 bytes in 1 blocks
==30830==         suppressed: 0 bytes in 0 blocks
==30830==


Thanks
Shapath
Comment 4 Philippe Waroquiers 2022-06-24 09:17:08 UTC
(In reply to shapath from comment #3)
>
> Valgrind report:-
> ==============
> (gdb) monitor leak_check full reachable any
When compiling with gcc -g -O0 and doing the leak search,
I do not get any definitely or possibly leaked block. Leak search reports
2 still reachable blocks.

You can use the following to see why a block is still reachable:
(where 0x4a330a0 is the addess of the strdup-ed "hello" 
(gdb) mo w 0x4a330a0
==8392== Searching for pointers to 0x4a330a0
==8392== tid 1 register R8 pointing at 0x4a330a0
(gdb) 


As you can see, in my case, the address of the just allocated name still happens
to be in a register.

When I force main to return, then name is reported as definitely leaked
(as the register pointing to name is likely used for something else)
Comment 5 shapath 2022-06-24 10:06:23 UTC
(In reply to Philippe Waroquiers from comment #4)
> (In reply to shapath from comment #3)
> >
> > Valgrind report:-
> > ==============
> > (gdb) monitor leak_check full reachable any
> When compiling with gcc -g -O0 and doing the leak search,
> I do not get any definitely or possibly leaked block. Leak search reports
> 2 still reachable blocks.
> 
> You can use the following to see why a block is still reachable:
> (where 0x4a330a0 is the addess of the strdup-ed "hello" 
> (gdb) mo w 0x4a330a0
> ==8392== Searching for pointers to 0x4a330a0
> ==8392== tid 1 register R8 pointing at 0x4a330a0
> (gdb) 
> 
> 
> As you can see, in my case, the address of the just allocated name still
> happens
> to be in a register.
> 
> When I force main to return, then name is reported as definitely leaked
> (as the register pointing to name is likely used for something else)

Tried the suggestion to compile with -O0.  Also modified program to print the address for strdup-ed "hello" before the program hits the infinite while loop. i see it reported as a definite leak.

 I tried "mo w 0x52050a0" which does not return any reference register where 0x52050a0 is the address.

(sjohri/coding)$ gcc -g -O0 valgrind.c  -o val_exmple
(sjohri/coding)$ valgrind --log-file=/var/tmp/_valgrind_%p --xml-file=/var/tmp/_valgrind_xml_%p  ./val_exmple

 "The strdup-ed address is 0x52050a0"

:(sjohri/coding)$ gdb val_exmple
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.0.1.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/sjohri/coding/val_exmple...done.
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 86547
Reading symbols from /usr/libexec/valgrind/vgpreload_core-amd64-linux.so...done.
Loaded symbols for /usr/libexec/valgrind/vgpreload_core-amd64-linux.so
Reading symbols from /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so...done.
Loaded symbols for /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000000004efc9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.0.1.el7_9.x86_64
(gdb) monitor leak_check full reachable any
==86547== 6 bytes in 1 blocks are definitely lost in loss record 1 of 2
==86547==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==86547==    by 0x4EC3B89: strdup (in /usr/lib64/libc-2.17.so)
==86547==    by 0x4006B2: my_malloc (valgrind.c:29)
==86547==    by 0x40070E: main (valgrind.c:39)
==86547==
==86547== 26 bytes in 1 blocks are still reachable in loss record 2 of 2
==86547==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==86547==    by 0x4006F4: main (valgrind.c:38)
==86547==
==86547== LEAK SUMMARY:
==86547==    definitely lost: 6 bytes in 1 blocks
==86547==    indirectly lost: 0 bytes in 0 blocks
==86547==      possibly lost: 0 bytes in 0 blocks
==86547==    still reachable: 26 bytes in 1 blocks
==86547==         suppressed: 0 bytes in 0 blocks
==86547==
(gdb) mo w 0x52050a0
==86547== Searching for pointers to 0x52050a0
(gdb)
Comment 6 shapath 2022-06-24 10:11:32 UTC
I understand the reachable part but not able to understand

==30830== 6 bytes in 1 blocks are definitely lost in loss record 1 of 2
==30830==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30830==    by 0x4EC3B89: strdup (in /usr/lib64/libc-2.17.so)
==30830==    by 0x4005D2: my_malloc (valgrind.c:29)
==30830==    by 0x400606: main (valgrind.c:37)
Comment 7 Philippe Waroquiers 2022-06-26 17:34:07 UTC
On Fri, 2022-06-24 at 10:06 +0000, shapath wrote:
> https://bugs.kde.org/show_bug.cgi?id=455826
> 
> --- Comment #5 from shapath <meeeeshu@gmail.com> ---
> (In reply to Philippe Waroquiers from comment #4)
> > (In reply to shapath from comment #3)
> > > 
> > > Valgrind report:-
> > > ==============
> > > (gdb) monitor leak_check full reachable any
> > When compiling with gcc -g -O0 and doing the leak search,
> > I do not get any definitely or possibly leaked block. Leak search reports
> > 2 still reachable blocks.
> > 
> > You can use the following to see why a block is still reachable:
> > (where 0x4a330a0 is the addess of the strdup-ed "hello" 
> > (gdb) mo w 0x4a330a0
> > ==8392== Searching for pointers to 0x4a330a0
> > ==8392== tid 1 register R8 pointing at 0x4a330a0
> > (gdb) 
> > 
> > 
> > As you can see, in my case, the address of the just allocated name still
> > happens
> > to be in a register.
> > 
> > When I force main to return, then name is reported as definitely leaked
> > (as the register pointing to name is likely used for something else)
> 
> Tried the suggestion to compile with -O0.  Also modified program to print the
> address for strdup-ed "hello" before the program hits the infinite while loop.
> i see it reported as a definite leak.
> 
>  I tried "mo w 0x52050a0" which does not return any reference register where
> 0x52050a0 is the address.
Depending on the code generated by the compiler and the moment at which a leak search
is done, a pointer might still be present (or not) in one register.

> 
> (sjohri/coding)$ gcc -g -O0 valgrind.c  -o val_exmple
> (sjohri/coding)$ valgrind --log-file=/var/tmp/_valgrind_%p
> --xml-file=/var/tmp/_valgrind_xml_%p  ./val_exmple
> 
>  "The strdup-ed address is 0x52050a0"
> 
> :(sjohri/coding)$ gdb val_exmple
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.0.1.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/sjohri/coding/val_exmple...done.
> (gdb) target remote | vgdb
> Remote debugging using | vgdb
> relaying data between gdb and process 86547
> Reading symbols from
> /usr/libexec/valgrind/vgpreload_core-amd64-linux.so...done.
> Loaded symbols for /usr/libexec/valgrind/vgpreload_core-amd64-linux.so
> Reading symbols from
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so...done.
> Loaded symbols for /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x0000000004efc9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-326.0.1.el7_9.x86_64
> (gdb) monitor leak_check full reachable any
> ==86547== 6 bytes in 1 blocks are definitely lost in loss record 1 of 2
> ==86547==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
> ==86547==    by 0x4EC3B89: strdup (in /usr/lib64/libc-2.17.so)
> ==86547==    by 0x4006B2: my_malloc (valgrind.c:29)
> ==86547==    by 0x40070E: main (valgrind.c:39)
As the struct component "name" is not aligned, the content of "char *name"
is not considered as a pointer, and so the strdup-ed string is considered 
as definitely lost in your case.



Philippe
Comment 8 shapath 2022-07-05 05:56:35 UTC
Thanks, Philippes for the response.

So does this means that Valgrind will report false positives for LDL if structure packing is used?
As I mentioned this is a third-party code and structure packing is used for many structures which are used extensively to save memory.
and it reports leaks at all these places.

Is there any flag or workaround available for this problem?

I googled and found that ASAN has a flag that helps to bypass this problem.
https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer


" use_unaligned	0	If 0, LSan will only consider properly aligned 8-byte patterns when looking for pointers. Set to 1 to include unaligned patterns. This refers to the pointer itself, not the memory being pointed at."

Is there any such solution that exists?