Bug 289939 - wish: complete monitor cmd 'leak_check' with details about leaked or reachable blocks.
Summary: wish: complete monitor cmd 'leak_check' with details about leaked or reachabl...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.7.0
Platform: Unlisted Binaries Linux
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-27 17:22 UTC by Philippe Waroquiers
Modified: 2012-01-26 23:20 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
2 new gdbserver commands, allowing fine grained interactive examination of leak search (56.90 KB, text/plain)
2011-12-27 22:39 UTC, Philippe Waroquiers
Details
updated to a recent svn version (56.81 KB, text/plain)
2012-01-22 21:57 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Waroquiers 2011-12-27 17:22:42 UTC
There are some situations in which it is useful to have more information
about the blocks which are merged in a loss record.
This is a.o. useful to find why some blocks are still reachable
and/or understand the reasons of indirect leaks.

These functionalities are provided via two GDB Server commands:
  block_list <loss_record_nr>
        after a leak search, shows the list of blocks of <loss_record_nr>
  who_points_at <addr> [<len>]
        shows places pointing inside <len> (default 1) bytes at <addr>
        (with len 1, only shows "start pointers" pointing exactly to <addr>,
         with len > 1, will also show "interior pointers")

I will soon attach a patch implementing the above two commands.

wish described 
  1. by Wilfried Goesgens on Valgrind users mailing list in
     http://sourceforge.net/mailarchive/message.php?msg_id=28584812
  2. was already described in coregrind/m_gdbserver/README_DEVELOPERS
Comment 1 Philippe Waroquiers 2011-12-27 22:39:57 UTC
Created attachment 67169 [details]
2 new gdbserver commands, allowing fine grained interactive examination of leak search

This patch implements two new memcheck gdbserver monitor commands:
  block_list <loss_record_nr>
        after a leak search, shows the list of blocks of <loss_record_nr>
  who_points_at <addr> [<len>]
        shows places pointing inside <len> (default 1) bytes at <addr>
        (with len 1, only shows "start pointers" pointing exactly to <addr>,
         with len > 1, will also show "interior pointers")


Compiled and reg-tested on f12/x86.
Documentation (memcheck user manual) done.
Still to do: new gdbserver_tests testing these 2 new commands.

The 'block_list' command is implemented on top of the lr_array/lc_chunks/lc_extras
arrays used during the last leak search.
NB: no impact on the memory for the typical Valgrind usage where a leak
search is only done at the end of the run.
Printing the block_list of a loss record simply consists in scanning the
lc_chunks to find back the chunks corresponding to the loss record for which
block lists is requested.

The 'who_points_at' command is implemented by doing a scan similar to (but simpler
than) the leak search scan.
lc_scan_memory has been enhanced to have a mode to search for a specific address,
rather than to search for all allocated blocks.
VG_(apply_to_GP_regs) has been enhanced to also provide the ThreadId and register
name in the callback function.

The patch touches multiple files (but most changes are easy/trivial or factorise
existing code).

A review is preferrable for the below files/changes:

memcheck/docs/mc_manual.xml : document the new commands
memcheck/mc_leakcheck.c :
    * changed the LC_Extra struct to remember the clique for indirect leaks
      (size of structure not changed).
    * made lr_array a static global
    * changed lc_scan_memory:
        to have a search mode for a specific address (for who_points_at)
        (for leak search) to pass a 'current clique' in addition to the clique leader
         so as to have a proper clique hierarchy for indirectly leaked blocks.
    * print_results: reset values at the beginning of the print_result of the next
      leak search, rather than at the end of print_results of the previous leak search.
      This allows to continue showing the same info for loss records till a new
      leak search is done.
    * new function print_clique which recursively prints a group of leaked blocks,
      starting from the clique leader.
    * new function MC_(print_block_list) : calls print_clique for each clique
      leader found for the given loss record.
    * static void scan_memory_root_set : code extracted from MC_(detect_memory_leaks)
        (no relevant change)
    * void MC_(who_points_at) : calls scan_memory_root_set, lc_scan_memory
        and VG_(apply_to_GP_regs)(search_address_in_GP_reg) to search 
        pointers to the given address.
Comment 2 Wilfried Goesgens 2011-12-28 17:07:09 UTC
hm, I'm doing something wrong here?


==26450== 655,360 bytes in 2 blocks are still reachable in loss record 2,055 of 2,055
==26450==    at 0x4C27673: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26450==    by 0x5C9F661: IncreaseBuf (stringbuf.c:311)
==26450==    by 0x5CA7FD1: StrBufSipLine (stringbuf.c:4557)
==26450==    by 0x48BAE4: RSSAggregator_ParseReply (rss_atom_parser.c:832)
==26450==    by 0x45D6AA: gotstatus (serv_eventclient.c:135)
==26450==    by 0x45D85E: stepmulti (serv_eventclient.c:178)
==26450==    by 0x45D8EB: got_in (serv_eventclient.c:190)
==26450==    by 0x58754D0: ev_invoke_pending (in /usr/lib/libev.so.4.0.0)
==26450==    by 0x587A51C: ev_run (in /usr/lib/libev.so.4.0.0)
==26450==    by 0x45F141: client_event_thread (serv_eventclient.c:560)
==26450==    by 0x44E92E: CTC_backend (threads.c:158)
==26450==    by 0x5EBFB4F: start_thread (pthread_create.c:304)
==26450== 
==26450== LEAK SUMMARY:
==26450==    definitely lost: 51,288 bytes in 22 blocks
==26450==    indirectly lost: 1,152 bytes in 24 blocks
==26450==      possibly lost: 814,515 bytes in 1,091 blocks
==26450==    still reachable: 2,609,474 bytes in 6,016 blocks
==26450==         suppressed: 0 bytes in 0 blocks
==26450== 
(gdb) block_list 2,055
Undefined command: "block_list".  Try "help".
(gdb)
Comment 3 Wilfried Goesgens 2011-12-28 17:47:29 UTC
ah.
monitor block_list 2055

http://apps.rockyou.com/images/facebook/apps/superwall/2008_08_07_rock_rule.jpg

;)
Comment 4 Philippe Waroquiers 2012-01-22 21:57:05 UTC
Created attachment 68099 [details]
updated to a recent svn version
Comment 5 Julian Seward 2012-01-23 10:55:22 UTC
(In reply to comment #4)
> Created an attachment (id=68099) [details]
> updated to a recent svn version


This is quite a complicated patch.  No objections in principle, but I
have the following concerns:

* The leak checker is already quite complex, and has been completely
  rewritten about 4 times, most recently by Nick Nethercote.  I'm
  concerned that these changes make it harder to understand and
  maintain.  Is there any way to make the changes in a way which has
  less impact on the existing code?  (I am not saying there is, but
  it's worth asking).

* (sanity check): when this extra functionality is not in use (iow, we
  are doing a normal leak check), do we touch any memory pages that we
  didn't before?  In the past we have had problems where the leak
  checker touches memory pages that are not accessible, there is a
  segfault, complicated and bad things happen.  So I want to check
  that we are not at further risk of that, as a result of this patch.

* Generally, can we avoid creating new global variables?  If V ever
  gets multithreaded, global variables will be a massive pain to deal
  with, so I'd prefer to avoid generating new ones.



Smaller comments:

+==20852== *0x8049e20 pointing at 0x4028028

I'd prefer "points at" to "pointing at"


SizeT MC_(cmalloc_n_frees) ( void )

change to MC_(get_cmalloc_n_frees)


+      union {
+         SizeT indirect_szB : (sizeof(SizeT)*8)-3; // If Unreached, how many bytes
+                                                   //   are unreachable from here.
+         SizeT  clique :  (sizeof(SizeT)*8)-3;      // if IndirectLeak, clique leader
+                                                   // to which it belongs.
+      } IC;

I assume that IC means "indirect or clique".  I think that code that
uses this would be clearer if you changed it to "IorC" (if my assumption
is correct, that is)



+// To detect lc_chunk is valid, we store the nr of frees operations done
+// when lc_chunk was build : lc_chunks (and lc_extras) stays valid as
+// long as no free operations has been done since lc_chunks building.

Do you have some assertions that check that lc_chunks is still valid?


+lc_push_if_a_chunk_ptr_register(ThreadId tid, Char* regname, Addr ptr)

please use HChar*, not Char*, for literal strings (like here, regname)
Comment 6 Philippe Waroquiers 2012-01-23 21:02:56 UTC
(In reply to comment #5)

> * The leak checker is already quite complex, and has been completely
>   rewritten about 4 times, most recently by Nick Nethercote.  I'm
>   concerned that these changes make it harder to understand and
>   maintain.  Is there any way to make the changes in a way which has
>   less impact on the existing code?  (I am not saying there is, but
>   it's worth asking).
Most of the big changes in mc_leakcheck.c are extracting some existing
pieces of code in their own function, but without changing the logic.
It is not clear to me how to separate more the "search" from
the "full leak search" without duplicating some code.

> * (sanity check): when this extra functionality is not in use (iow, we
>   are doing a normal leak check), do we touch any memory pages that we
>   didn't before?  In the past we have had problems where the leak
>   checker touches memory pages that are not accessible, there is a
>   segfault, complicated and bad things happen.  So I want to check
>   that we are not at further risk of that, as a result of this patch.
The core of the leak check algorithm (a.o. which pages are being scanned)
is completely shared between the normal leak check and the new mode,
and has the same logic as before.

> * Generally, can we avoid creating new global variables?  If V ever
>   gets multithreaded, global variables will be a massive pain to deal
>   with, so I'd prefer to avoid generating new ones.
I guess that a multi-threaded Valgrind will have "fine grained" locks
(e.g. read and/or write locks on the translation table) but will also have
the equivalent of the current "big lock" (ensuring that no thread is running
user code, and that only one thread runs some valgrind code).
For the leak check, this "big lock" will have to be acquired (as I cannot imagine
how the leak search will work if another thread is e.g. in parallel munmapping
memory being scanned).
There are other things where either we need fine-grained locks (e.g. on the
"free list" of memcheck) or alternatively "per thread" variables.
(maybe we could discuss more in depth the Valgrind multi-threaded at FOSDEM?)

That being said, I think the current leak search global variables (lc_chunks, lc_n_chunks,
lr_table, ...) could be regrouped in one single struct to make it more clear that
these are all related. Something like:
  struct {
     MC_Chunk ** lc_chunks;
     Int lc_n_chunks;
     LC_Extra* lc_extras;
     OSet* lr_table;
     .... etc ...
  } LeakCheckGlobal;
Currently, there are links/relationships e.g. between lc_extras and lc_chunks.
Having a struct like this looks more clear to me.
If you find this a good idea, I can do this in another patch.


> Smaller comments:
... will do all small comments ....

> +// To detect lc_chunk is valid, we store the nr of frees operations done
> +// when lc_chunk was build : lc_chunks (and lc_extras) stays valid as
> +// long as no free operations has been done since lc_chunks building.
> 
> Do you have some assertions that check that lc_chunks is still valid?
Yes, this is checked: if a free has been done since the last leak search,
the monitor command will instruct the user to redo a leak search.
Comment 7 Julian Seward 2012-01-25 08:33:59 UTC
(In reply to comment #6)
> Most of the big changes in mc_leakcheck.c are extracting some existing
> pieces of code in their own function, but without changing the logic.

Ok; no objections then; pls proceed.

Two small things:


(From comment #1)
  who_points_at <addr> [<len>]
        shows places pointing inside <len> (default 1) bytes at <addr>
        (with len 1, only shows "start pointers" pointing exactly to <addr>,
         with len > 1, will also show "interior pointers")

I'm a bit concerned that the convention "1 == start of block, > 1 ==
inside the block" is inconsistent with the terminology used for block
offsets in all the other error messages.  They use "0 bytes inside a
block" == start of block, "> 0 bytes inside a block" == inside the
block.  I am wondering if it would be cleaner to somehow use the same
terminology here.  Hmm, maybe not though.  Maybe using length is more
natural here.


> That being said, I think the current leak search global variables (lc_chunks,
> lc_n_chunks,
> lr_table, ...) could be regrouped in one single struct to make it more clear
> that
> these are all related. Something like:
>   struct {
>      MC_Chunk ** lc_chunks;
>      Int lc_n_chunks;
>      LC_Extra* lc_extras;
>      OSet* lr_table;
>      .... etc ...
>   } LeakCheckGlobal;
> Currently, there are links/relationships e.g. between lc_extras and lc_chunks.
> Having a struct like this looks more clear to me.
> If you find this a good idea, I can do this in another patch.

Yes, I think that is a good idea.
Comment 8 Philippe Waroquiers 2012-01-26 23:19:39 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Most of the big changes in mc_leakcheck.c are extracting some existing
> > pieces of code in their own function, but without changing the logic.
> 
> Ok; no objections then; pls proceed.
Thanks for the review, committed as rev 12357

> (From comment #1)
>   who_points_at <addr> [<len>]
>         shows places pointing inside <len> (default 1) bytes at <addr>
>         (with len 1, only shows "start pointers" pointing exactly to <addr>,
>          with len > 1, will also show "interior pointers")
> 
> I'm a bit concerned that the convention "1 == start of block, > 1 ==
> inside the block" is inconsistent with the terminology used for block
> offsets in all the other error messages.  They use "0 bytes inside a
> block" == start of block, "> 0 bytes inside a block" == inside the
> block.  I am wondering if it would be cleaner to somehow use the same
> terminology here.  Hmm, maybe not though.  Maybe using length is more
> natural here.

<len> notation is also used in other monitor commands when speaking about
the size of a piece of memory. As you indicate, block offsets should start
with 0 (i.e. who_points_at will use the block offset notation to describe the offset
at which the discovered pointer points at).


> > Currently, there are links/relationships e.g. between lc_extras and lc_chunks.
> > Having a struct like this looks more clear to me.
> Yes, I think that is a good idea.
Will do this in a follow-up patch.