Bug 296318

Summary: Debug info improvements (more than one rx/rw mapping)
Product: [Developer tools] valgrind Reporter: Jiří Hruška <jirka>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: minor CC: hrkzmnm, philippe.waroquiers
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Proposed patch (against SVN r12450)
Proposed patch v2 (against SVN r12566)
Proposed patch v3 (against SVN r12734)
Proposed patch v4 (against SVN r12734) - only the necessary PDB reader changes
PDB reader cleanup (against SVN r12734 + the main patch)

Description Jiří Hruška 2012-03-19 00:18:01 UTC
User-Agent:       Opera/9.80 (Windows NT 6.1; WOW64; U; Edition Next; en) Presto/2.10.269 Version/12.00
Build Identifier: 

We use utility called "patchelf" at work to change rpath in binaries late during build/packaging process. The tool creates an additional ELF section with the new rpath and mangles the file in a way which breaks Valgrind's debug symbol loader. This makes the applications currently unable to run under Valgrind.

Reproducible: Always

Steps to Reproduce:
1. Obtain/build patchelf (http://nixos.org/patchelf.html).
2. Patch some program, e.g.: cp /bin/ls . && ./patchelf --set-rpath /lib ./ls
3. valgrind ./ls

Actual Results:  
--6092-- WARNING: Serious error when reading debug info
--6092-- When reading debug info from .../ls:
--6092-- Can't make sense of .got section mapping
valgrind: m_debuginfo/readelf.c:1209 (vgModuleLocal_read_elf_debug_info): Assertion '!di->soname' failed.
==6092==    at 0x3802812E: ??? (in /usr/lib/valgrind/memcheck-x86-linux)


Expected Results:  
The program runs fine like without patching.


The root problem is that the debug info loader would need to remember three mmappings, 1 rx and 2 rw, instead of the usual 1 rx and 1 rw, which it is made to support. Somewhat related work was done under r11794 for compatibility with elfhack used by Mozilla, yet only in the ELF loader with the program sections.

I originally created some crude hack to work with this kind of modified binaries on Valgrind 3.6.something, now I decided to bite the bullet and make it all fully generic to support any weird layouts there might be even in the future and try to let it get accepted into the mainline.

What I did was basically changing the simple FSM structure used by the debug info loader to contain a dynamic array of all seen mappings instead of fixed fields per memory access, similarly in the ELF loader with the program sections, then making the rest of the code work with it this way.

In the end, the debug info for e.g. these patchelf'ed binaries is tried to be loaded once after we have both rw and rx mappings, but fails due to the missing second rw (or whichever) mapping. Only after the second (or other subsequent) mmap(), it finally succeeds.


== Affected components ==

* Common debug info (debuginfo.c, storage.c, priv_storage.h)
  - Removed fixed {rx,rw,ro}_map_{avma,size,bias} fields from _DebugInfoFSM, added XArray of struct _DebugInfoMapping, containing the same but in any quantity
  - Changed various lookups, checks and debug messages to work with the DI mapping array instead of the fixed FSM fields
  - Added helper function ML_(find_rx_mapping) for the most common operation on the array of mappings

* ELF reader (readelf.c)
  - ML_(read_elf_debug_info): Used XArray of RangeAndBias instead of two fixed arrays for the RX and RW areas
  - Changed the mapping lookups, checks and debug messages to work with the DI mapping array instead of the fixed FSM fields

* Mach-O stuff (readmacho.c)
  - Changed the mapping lookups, checks and debug messages to work with the DI mapping array instead of the fixed FSM fields

* PDB/Wine stuff (readpdb.c etc.)
  - Cleaned up the biasing mess in PDB loader and documented how it really works (seeing how much other ugly stuff is in there, I'm actually thinking about sanitizing it more thoroughly, perhaps fixing bug 211529 etc.)
  - Changed the PE section parser to add the mapped areas to DI mapping array instead of the fixed FSM fields
  - Added fpo_base_avma to _DebugInfo, because there was no better variable to use for "current image base" (AFAICS)


== Performance ==
I did not expect any significant performance impact as all the complex changes are in code which runs only once at initialization per each loaded module. However running `make perf` and comparing with unpatched version showed very small, but stable slowdown in some tests.
I tracked this down to ML_(addLineInfo), ML_(addDiCfSI), ML_(addVar) etc., calling the newly added ML_(find_rx_mapping) function. It is simple enough by itself, but those parent functions can be easily called hundreds of thousands times and then everything counts, even the external calls to VG_(sizeXA) and VG_(indexXA). So I added a dumb caching of the last matched mapping and this seemed to make the performance differences negligible. Maybe the external calls could be eliminated completely, I don't know.


== What's missing ==
- Testing on Mac/OS X (I don't have access to such system and I don't even know if the code compiles, let alone if it works incl. the Mac OSX 10.7 RO-mapping stuff etc.)
- Finding any other possible performance or compatibility problems caused by the patch
Comment 1 Jiří Hruška 2012-03-19 00:19:01 UTC
Created attachment 69724 [details]
Proposed patch (against SVN r12450)
Comment 2 Philippe Waroquiers 2012-05-17 14:10:10 UTC
I tested the patch on f12/x86, deb6/amd64 and f16/ppc64.
No regression, 6 more successful tests on f16/ppc64.

Due to lack of knowledge in debug info readers, cannot review it properly (and so no commit by me).

Just one remark: in m_debuginfo, all VG_(newXA) are using ML_(dinfo_zalloc/free) .
In the patch, a call to newXA is introduced using VG_(mallloc/free).
Comment 3 Jiří Hruška 2012-05-17 15:05:42 UTC
Hi Philippe, thanks for trying it out. Seeing it has helped with some tests on ppc64 was an unexpected nice surprise.

Good catch with the (de)alloc callbacks - I had actually found out about this m_debuginfo specialty and changed them elsewhere, but missed this newXA() call. I'll update the patch, also to keep it in line with the SVN.
Comment 4 Jiří Hruška 2012-05-17 15:06:47 UTC
Created attachment 71157 [details]
Proposed patch v2 (against SVN r12566)
Comment 5 Julian Seward 2012-07-11 14:52:35 UTC
> Created attachment 71157 [details]
> Proposed patch v2 (against SVN r12566)

Some initial comments:

* This contains a whole bunch of changes in
  coregrind/m_debuginfo/readpdb.c.  Why is that?  Are they related?
  I thought this patch was only for ELF object reading.

* i/j mixup
  +               for(j = 0; i < VG_(sizeXA)(di->fsm.maps); i++) {

* Am concerned that it will break OSX, which was hard to make work
  properly.  Eg, where did this get moved to?

  -   if (is_ro_map) {
  -      /* We have a r-- mapping.  Note the details (OSX 10.7, 32-bit only) */
  -      if (!di->fsm.have_ro_map) {
  -         di->fsm.have_ro_map = True;
  -         di->fsm.ro_map_avma = a;
  -         di->fsm.ro_map_size = seg->end + 1 - seg->start;
  -         di->fsm.ro_map_foff = seg->offset;
  -      } else {
  -         /* FIXME: complain about a second r-- mapping */
  -      }
  -   }

Overall, I feel like the fixes this patch does are good, but I am
concerned that it is risky -- going to cause breakage.
Comment 6 Jiří Hruška 2012-07-12 15:32:06 UTC
Created attachment 72476 [details]
Proposed patch v3 (against SVN r12734)
Comment 7 Jiří Hruška 2012-07-12 15:32:35 UTC
> * This contains a whole bunch of changes in
>   coregrind/m_debuginfo/readpdb.c.  Why is that?  Are they related?
>   I thought this patch was only for ELF object reading.
Well it started with ELF files with more mappings, but in order to
allow them in ELF, the common structures needed to change too and
that triggered modifications in other readers.
  Specifically in the Win/PDB stuff, it went like this: To be able to
even compile the readpdb.c with my changes, I needed to change the
part which goes through PE section headers and sets rx/rw_map_avma
etc., because dynamic arrays are used now instead. I noticed that
both rx_map_avma and text_bias are used at different places through
the rest of the code using the horrible BIAS_FOR_WHATEVER macros.
  So there, I could either patch it blindly or build Wine and see
how to adapt the PDB reader properly. After some debugging and
making it all work, I saw the biasing mess was not necessary, so
I bit the bullet and went to remove it alogether, name stuff like
unknown_purpose__reloc correctly and document it all properly.
  These changes were influenced directly by changing the way debug
info mappings are stored, even if PE-specific, so I included them in
the patch. However I see your point there are now some changes not
related to *just* the possibility of multiple mappings -- so I can
split the patch if you'd like.

> * i/j mixup
>   +               for(j = 0; i < VG_(sizeXA)(di->fsm.maps); i++) {
Bah, thanks! Hate that kind of errors... (Shows I had no debug
symbols in separate files on the systems I tested it.) Verified there
is no other for-typo like that.

> * Am concerned that it will break OSX, which was hard to make work
>   properly.  Eg, where did this get moved to?
>   -   if (is_ro_map) {
>   -      /* We have a r-- mapping.  Note the details (OSX 10.7, 32-bit only) */
>   -      if (!di->fsm.have_ro_map) {
>   -         di->fsm.have_ro_map = True;
>   -         di->fsm.ro_map_avma = a;
>   -         di->fsm.ro_map_size = seg->end + 1 - seg->start;
>   -         di->fsm.ro_map_foff = seg->offset;
>   -      } else {
>   -         /* FIXME: complain about a second r-- mapping */
>   -      }
>   -   }
This extra block is no longer necessary, because all mappings are
stored in the di->fsm.maps array in a generic matter.
+   map.ro   = is_ro_map;
+   VG_(addToXA)(di->fsm.maps, &map);
+   ...
+   di->fsm.have_ro_map |= is_ro_map;
Similarly, the counterpart in VG_(di_notify_vm_protect) which upgrades
the mapping finds the relevant RO mapping to change to RW in the array
and updates it.

> Overall, I feel like the fixes this patch does are good, but I am
> concerned that it is risky -- going to cause breakage.
I know, there is quite a lot of changes in code which, which is kind
of essential to make V useful. It's definitely useful though, so I hope
it will be possible to make it ready for commit.
Comment 8 Jiří Hruška 2012-07-12 16:43:09 UTC
Created attachment 72482 [details]
Proposed patch v4 (against SVN r12734) - only the necessary PDB reader changes
Comment 9 Jiří Hruška 2012-07-12 16:43:50 UTC
Created attachment 72483 [details]
PDB reader cleanup (against SVN r12734 + the main patch)
Comment 10 Julian Seward 2012-07-13 11:24:33 UTC
(In reply to comment #8)
> Created attachment 72482 [details]

Committed, r12735.
Comment 11 Jiří Hruška 2012-07-13 12:23:58 UTC
Tested with wine 1.5.0, needed recompiling with Valgrind headers available in order to compile in the necessary client callback.

Test program, compiled with Microsoft Visual C++ 2010, CFLAGS=/Od /Zi (no optimizations, debug info), LFLAGS=/debug (debug info).
---
#include <stdio.h>
#include <stdlib.h>

void* thisleaks()
{
  return malloc(1234);
}

int main()
{
  void* buffer = thisleaks();
  printf("Yay!\n");
  return 0;
}
---
Generates test.exe and test.pdb.

Results:
Program runs OK,
--trace-symtab=yes shows symbols being loaded correctly,
output contains relevant stack trace:
==9842== 1,234 bytes in 1 blocks are definitely lost in loss record 74 of 75
==9842==    at 0x477121C: RtlAllocateHeap (heap.c:254)
==9842==    by 0x4D143F6: MSVCRT_malloc (heap.c:312)
==9842==    by 0x40151D: thisleaks (in /home/jirka/dev/test/test.exe)
==9842==    by 0x401538: main (in /home/jirka/dev/test/test.exe)
I did not see this stack trace in the report before patching.
Comment 12 Julian Seward 2012-07-13 12:59:57 UTC
(In reply to comment #9)
> Created attachment 72483 [details]

Committed, r12736.  Thanks for the patches.
Comment 13 Julian Seward 2012-07-13 13:56:30 UTC
*** Bug 300195 has been marked as a duplicate of this bug. ***