Bug 289578 - Backtraces with ARM unwind tables (=without debug symbols) and support for offline symbol resolving
Summary: Backtraces with ARM unwind tables (=without debug symbols) and support for of...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.7.0
Platform: Meego/Harmattan Linux
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-22 11:21 UTC by Eero Tamminen
Modified: 2014-09-02 21:40 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Path for switching on manual unwinding (2.52 KB, patch)
2013-01-10 09:16 UTC, Vasily
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2011-12-22 11:21:03 UTC
Version:           3.7.0
OS:                Linux

1. Debug symbols can be huge (especially for C++ code), they take a lot of disk space.

2. Loading/using the debug symbols also takes a huge amount of RAM.  And it's slow.

3. Valgrind and the emulated program can already take all the available memory, even without debug symbols.

All this means that current Valgrind often cannot be used on smaller devices, for debugging processes that most need it (which leak / use most memory).


Reproducible: Always

Steps to Reproduce:
1. Have a system in a state where there's inadequate amount of RAM for step 3)
2. Have hundreds of MBs of debug symbols for Qt and related libraries used by your program
3. Try Valgrinding the program in a situation where the program itself uses a lot of memory
4. Remove all debug symbols and try again


Actual Results:  
At step 3) the valgrinded program is OOM-killed.
At step 4) the program works, but valgrind backtraces are useless, too short, not resolved etc.


Expected Results:  
A way to get useful debugging information.

(I'm mainly interested of having this on ARM devices which have less memory and which uses different unwind table data than x86.)


AFAIK valgrind needs the debug symbols just to:
- unwind the stack to get backtraces
- get function names for error suppression rule matching

If Valgrind would use unwind tables for getting backtraces and also output loaded code (binary/library) addresses used on device, resolving of the symbols (and error suppression) could be done off-line, on a host which has corresponding binaries and full debug symbols.


There's already code for doing the unwinding using unwind tables in Glibc backtrace function and sp-rtrace-resolve[1] has code for off-line symbol resolving, for the sp-rtrace trace format[2].  These might be of help in doing something similar in Valgrind.

[1] https://maemo.gitorious.org/maemo-tools/sp-rtrace/trees/master/src/rtrace-resolve
[2] https://maemo.gitorious.org/maemo-tools/sp-rtrace/blobs/master/TEXT_PROTOCOL
Comment 1 Tom Hughes 2011-12-22 12:23:17 UTC
We've been using the unwind tables for ages...
Comment 2 Eero Tamminen 2011-12-22 13:26:10 UTC
(In reply to comment #1)
> We've been using the unwind tables for ages...

On which architectures?

On ARM (MeeGo/Harmattan on N9) I get:
$ valgrind /usr/bin/widgetsgallery
==26939== Memcheck, a memory error detector
==26939== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==26939== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
==26939== Command: /usr/bin/widgetsgallery
==26939== 
==26939== Invalid read of size 1
==26939==    at 0x6633C8C: ??? (in /usr/lib/libfontconfig.so.1.4.4)
==26939==  Address 0x77a5809 is 7 bytes before a block of size 17 alloc'd
==26939==    at 0x4834EBC: malloc (vg_replace_malloc.c:236)
==26939==    by 0x664B30B: ??? (in /usr/lib/libfontconfig.so.1.4.4)

Although unwind information[1] would seem to be present:
$ readelf -S /usr/lib/libfontconfig.so.1.4.4 | grep ARM.ex
  [13] .ARM.extab        PROGBITS        0002ebbc 02ebbc 000e1c 00   A  0   0  4
  [14] .ARM.exidx        ARM_EXIDX       0002f9d8 02f9d8 000c38 00  AL 10   0  4

[1] https://wiki.linaro.org/KenWerner/Sandbox/libunwind#unwinding_on_ARM


Valgrind output is also missing the binary and library mapping addresses so that symbols could be resolved afterward.
Comment 3 Tom Hughes 2012-04-19 07:34:19 UTC
Those aren't any kind of DWARF unwind tables I've ever heard of.

What we use are DWARF unwind tables, either as .dwarf_frame from the debug information or as .eh_frame from the main program.
Comment 4 Eero Tamminen 2012-04-19 08:24:14 UTC
(In reply to comment #3)
> Those aren't any kind of DWARF unwind tables I've ever heard of.
>
> What we use are DWARF unwind tables, either as .dwarf_frame from the debug
> information or as .eh_frame from the main program.

As I stated, ARM has its own unwind information, and it's in the .ARM.ex* ELF sections.  "readelf -u <ELF binary>" will show you its contents.

This bug is about:
* Valgrind not supporting the ARM unwind information
* Valgrind not providing the library mapping information[1] so that the backtrace
  addresses could later on be resolved to function/method names with debug symbols
  (using them in Valgrind would take too much memory)

[1] Maps file isn't enough, lot of things dlopen and dlclose extra libraries, so the addresses need to be shown whenever libraries are memory mapped.


(In reply to comment #0)
> There's already code for doing the unwinding using unwind tables in Glibc
> backtrace function

Actually, the Glibc backtrace function just calls the libgcc unwind code.  I think the code is here:
  http://gcc.gnu.org/viewcvs/trunk/libgcc/
Comment 5 Tom Hughes 2012-04-19 08:36:52 UTC
Sure, I was just trying to explain where the confusion arose between us with me saying unwind information was supported and you that it didn't seem to work.

Unfortunately if ARM has decided to use non-standard unwind tables then somebody is going to have to do the work to support them - patches welcome.
Comment 6 Eero Tamminen 2012-04-19 08:51:27 UTC
(In reply to comment #5)
> Unfortunately if ARM has decided to use non-standard unwind tables then
> somebody is going to have to do the work to support them - patches welcome.

I don't have time to look into that myself, but maybe I can persuade somebody else to do that.  Any pointers on where one should start?

Or would it be possible for Valgrind to use the emulated process libgcc unwind functions to get the backtrace addresses?  Or use libunwind?

Libunwind supports debug symbols, unwind information and frame pointer based unwinding (in that order), for many architectures, including ARM.  One minor problem with that is that if there are no debug symbols, no undwind information and libunwind falls back to frame pointers, on a code that's compiled *without* them, it can access invalid memory & crash...


Also, what about the support for printing out the library mapping addresses so that backtrace addresses can be later on resolved with debug symbols?   That would help all the architectures...
Comment 7 Julian Seward 2012-04-19 09:06:56 UTC
Eero, this is confusing.  Are you filing a bug about too much memory
consumption for the read debuginfo, or about need to support ARM-style
unwinding, or need to make offline symbol resolving usable -- which?

That said, I understand the kinds of problems you have.  I have been
using V on Firefox on Android, and pulling in the debug info from a
350MB libxul.so is slow and expensive in memory.  I imagine Qt is
similar or worse.

One thing that helps is to add swap -- at least you won't get OOMd.  I
have a Nexus S with a 1GB swap file.  It does require recompiling the
kernel with swap enabled, though, which is a lot of hassle.

re ARM EXIDX unwinding, I've not come across the need for it when
dealing with Firefox on Android or vanilla ARM Linux.  But if there is
a need for it then sure, patches welcome.  It looks a lot simpler than
DWARF CFI.

Tom: Another thing that might library loading slow is the CRC
calculation that is done relating to finding separate debuginfo files.
Whilst profiling other stuff some time back, I noticed V spends 90
million instructions calculating CRCs for any trivial startup on my U
10.04 box, eg for /bin/date.  I didn't look into it, but it worries me
that perhaps a CRC is calculated for every .so we consider for
debuginfo reading.  If the objects are large and situated on a very
slow flash filesystem (as is the case for me + 350MB libxul.so) then
that might be causing a big delay, as it forces the entire file to be
read from the filesystem.
Comment 8 Julian Seward 2012-04-19 09:08:49 UTC
(In reply to comment #6)
> Also, what about the support for printing out the library mapping addresses
> so that backtrace addresses can be later on resolved with debug symbols?  

Can you show some examples of the output you would like to have?
Comment 9 Tom Hughes 2012-04-19 09:09:05 UTC
We can't use libunwind, at least not dynamically linked, because valgrind is in the same process as the client so can't use any dynamic linking as it would conflict with the client.

The place to start looking at doing a native implementation in valgrind would be VG_(get_StackTrace_wrk) in coregrind/m_stacktrace.c which does the unwinding, either by using frame pointers or by calling VG_(use_CF_info) or VG_(use_FPO_info) to use unwind iformation.

So for DWARF you have ML_(read_callframe_info_dwarf3) in coregrind/m_debuginfo/readdwarf.c which is the code for reading the DWARF unwind information, triggered from coregrind/m_debuginfo/readelf.c, and VG_(use_CF_info) in coregrind/m_debuginfo/debuginfo.c which takes care of using it to unwind the stack.
Comment 10 Julian Seward 2012-04-19 09:22:07 UTC
There are two ways you could do this.  Either read the EXIDX stuff
into its own data structure (as is done for FPO info on x86) and then
use that data structure in VG_(get_StackTrace_wrk) if the call to
VG_(use_CF_info) fails.

Or (possibly) don't add a new data structure.  Instead, when reading
EXIDX info, create "fake" DiCfSI records.  These store the Dwarf
unwind info.  Then the existing mechanism behind VG_(use_CF_info) will
unwind using EXIDX too.

Whether the second approach will actually work in practice, I am
unsure.
Comment 11 Eero Tamminen 2012-04-19 09:49:29 UTC
(In reply to comment #7)
> Eero, this is confusing.  Are you filing a bug about too much memory
> consumption for the read debuginfo, or about need to support ARM-style
> unwinding, or need to make offline symbol resolving usable -- which?

The ARM specific problem is that Valgrind doesn't work because it either uses too much memory (with debug symbols) or it doesn't provide the needed backtraces (without debug symbols).

Solution to that is using ARM unwind information as all Linux distros are now compiled with that.   This is what this bug is primarily about.


However, the generic problem with unwind information is that then resolving will at best be done using ELF symbol table so you get only addresses for symbols, and the names may even be wrong (matched to whatever exported function happened to preceed the given address instead a correct static one).

You cannot easily resolve the addresses based on information provided by Valgrind currently and adding debug symbols for things you're interested about may again use too much memory (especially if you install them with packages that also pull in all debug symbols for dependent packages).

Providing the library mapping addresses would make it possible to do resolving afterwards (either manually, or using some tool).  sp-rtrace output for libraries looks like this:
: /bin/busybox => 0x8000-0x6f000
: /lib/ld-2.13.so => 0x40025000-0x4003b000
: /lib/librt-2.13.so => 0x40044000-0x40049000
: /lib/libdl-2.13.so => 0x40077000-0x40079000

The info is output whenever an executable code section of a file is memory mapped.

If you prefer, I could file a separate bug about that.


> One thing that helps is to add swap -- at least you won't get OOMd.  I
> have a Nexus S with a 1GB swap file.  It does require recompiling the
> kernel with swap enabled, though, which is a lot of hassle.

While it sometimes helps, memory shortage / swap usage with Valgrind often makes programs completely unusable because of the performance impact and some things (like certain UI actions) having timeouts.


> Tom: Another thing that might library loading slow is the CRC
> calculation that is done relating to finding separate debuginfo files.
> Whilst profiling other stuff some time back, I noticed V spends 90
> million instructions calculating CRCs for any trivial startup on my U
> 10.04 box, eg for /bin/date.  I didn't look into it, but it worries me
> that perhaps a CRC is calculated for every .so we consider for
> debuginfo reading.

CRC you get from the .gnu_debuginfo section should be calculated only for the debug symbol file.  IMHO it's necessary as otherwise you cannot be sure that you got matching debug information.  Getting wrong information is worse than getting no information.

But maybe there could be some option to skip the CRC check, one could use it on further runs of Valgrind as you only need to do the check once.  :-)


> If the objects are large and situated on a very
> slow flash filesystem (as is the case for me + 350MB libxul.so) then
> that might be causing a big delay, as it forces the entire file to be
> read from the filesystem.

This also makes the memory issue worse because it "poisons" the kernel page and causes extra paging.
Comment 12 Vasily 2012-10-12 05:32:44 UTC
Hi.
I am trying to add the ability of reading unwind information (from ARM.ex*) in valgrind. Unfortunatelly, I have not a lot of knowlegde in this area. I will be very appreciate if you help me with two questions:
1. Where can I read the structure of unwind information and how it is used in valgrind (.dwarf_frame, .eh_frame and ARM.ex*).
2. How can I test my realization of the unwind reader on my program? I compiled simple program on C with two functions:
/////
...
int main() {

    child(10);
    return 0;
}
void child(int x) {
    printf("X = %d", x);
}
///////

Using "valgrid --tool=callgrind NAME_OF_PROGRAM" it is possible to resolve names of all my functions (main, child) in any case even on target with ARM! I tried to compile with different options: -fno-exceptions, -funwind-tables, -fasynchronous-unwind-table, with and without -g.
How must I compile my program to reproduce the situation when callgrind can't resolve names of functions in my program (for example child)?

Thank you in advance for any help.
Comment 13 Julian Seward 2012-10-12 09:04:50 UTC
(In reply to comment #12)

It would be good to add EXIDX unwinding to Valgrind.

> 1. Where can I read the structure of unwind information and how it is used
> in valgrind (.dwarf_frame, .eh_frame and ARM.ex*).

It is stored in coregrind/m_debuginfo/priv_storage.h, struct
_DebugInfo, fields "cfsi*" (on ARM).

It is used in coregrind/m_stacktrace.c, section inside #if
defined(VGP_arm_linux), function VG_(get_StackTrace_wrk).

> 2. How can I test my realization of the unwind reader on my program? I
> compiled simple program on C with two functions:

For simple unwinding tests I use memcheck/tests/errs1.c.  Run it on
x86 or using the normal dwarf unwind, so as to see what results you
are expected.  And use --tool=memcheck, not --tool=callgrind.

--------

Ah, but are you solving the right problem?  Are you really trying to
make Callgrind work properly on ARM?  That is a different problem.
Comment 14 Julian Seward 2012-10-12 13:58:36 UTC
EXIDX is documented at
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038a/IHI0038A_ehabi.pdf
(thanks to Ted M for the link)
Comment 15 Ted Mielczarek 2012-10-12 14:18:49 UTC
A few notes, as I've dealt with this information a bit--I took the libunwind code for parsing these tables and shoehorned it into Breakpad. You can find my patch here if you're interested: https://breakpad.appspot.com/480002/ (all the code involved is BSD licensed).

(In reply to comment #10)
> Or (possibly) don't add a new data structure.  Instead, when reading
> EXIDX info, create "fake" DiCfSI records.  These store the Dwarf
> unwind info.  Then the existing mechanism behind VG_(use_CF_info) will
> unwind using EXIDX too.

I took this approach for my Breakpad patch. Breakpad had existing DWARF CFI parsing code, and a textual format for describing CFI unwind instructions, so my code translated the ARM unwind instructions into roughly equivalent CFI rules. I didn't spend much time to make the output pretty, but it works.

(In reply to comment #11)
> However, the generic problem with unwind information is that then resolving
> will at best be done using ELF symbol table so you get only addresses for
> symbols, and the names may even be wrong (matched to whatever exported
> function happened to preceed the given address instead a correct static one).

You can reach a reasonable compromise here if you don't need the full DWARF: run just "strip --strip-debug" on the binaries. This will keep the full symbol table but strip out the DWARF debug info. This way you should be able to get useful symbols. We're currently shipping Firefox Nightly builds like this, so that our built-in profiler can get useful stack traces.
Comment 16 Vasily 2012-10-16 04:54:05 UTC
(In reply to comment #13)
Ok, because I am new in it I kindly ask you to clarify one question for me.

> (In reply to comment #12)
> 
> It would be good to add EXIDX unwinding to Valgrind.
> 
> > 1. Where can I read the structure of unwind information and how it is used
> > in valgrind (.dwarf_frame, .eh_frame and ARM.ex*).
> 
> It is stored in coregrind/m_debuginfo/priv_storage.h, struct
> _DebugInfo, fields "cfsi*" (on ARM).
> 
> It is used in coregrind/m_stacktrace.c, section inside #if
> defined(VGP_arm_linux), function VG_(get_StackTrace_wrk).
> 
> > 2. How can I test my realization of the unwind reader on my program? I
> > compiled simple program on C with two functions:
> 
> For simple unwinding tests I use memcheck/tests/errs1.c.  Run it on
> x86 or using the normal dwarf unwind, so as to see what results you
> are expected.  And use --tool=memcheck, not --tool=callgrind.
> 

I tried to run memcheck/tests/errs1.c on device and on computer. Here you can find results:
On computer (gcc -O1)
==22469== Invalid read of size 1
==22469==    at 0x4004FB: ddd (in a.out)
==22469==    by 0x400504: ccc (in a.out)
==22469==    by 0x40050B: bbb (in a.out)
==22469==    by 0x400512: aaa (in a.out)
==22469==    by 0x400566: main (in a.out)

On computer (gcc -O1 -g)
==22492== Invalid read of size 1
==22492==    at 0x4004FB: ddd (experiments.c:6)
==22492==    by 0x400504: ccc (experiments.c:7)
==22492==    by 0x40050B: bbb (experiments.c:8)
==22492==    by 0x400512: aaa (experiments.c:9)
==22492==    by 0x400566: main (experiments.c:18)

On device (gcc - O1)
==26111== Invalid read of size 1
==26111==    at 0x87B8: ddd (in experiments)

On device (gcc -O1 -g)
==26155== Invalid read of size 1
==26155==    at 0x87B8: ddd (experiments.c:6)
==26155==    by 0x87CF: ccc (experiments.c:7)
==26155==    by 0x87DB: bbb (experiments.c:8)
==26155==    by 0x87E7: aaa (experiments.c:9)
==26155==    by 0x8837: main (experiments.c:18)

And flag -funwind-tables doesn't influence on the output. So, if I add EXIDX unwinding (reading .ARM.ex* and creating  "fake" DiCfSI records) the output on DEVICE (gcc -O1 -funwind-tables) will be same as now on COMPUTER (gcc -O1). Am I right?

> --------
> 
> Ah, but are you solving the right problem?  Are you really trying to
> make Callgrind work properly on ARM?  That is a different problem.

In general the main task is to decrease the memory consumption of valgrind (working with ARM). As far as I understand, adding support of EXIDX unwinding allows not to use whole debug information, that can slightly decrease the memory consumption.

Thank you in advance.
Comment 17 Kenny Root 2012-10-18 17:26:50 UTC
Android also has a stack unwinding library that might be useful here:

https://android.googlesource.com/platform/system/core/+/master/libcorkscrew/
Comment 18 Vasily 2012-11-27 14:34:47 UTC
Hello,

I am trying to solve the problem with reading .ARM.ex* tables and I have achieved some results. It is possible to obtain info in the format like: 
STACK CFI INIT 44f8 114 .cfa: sp .ra: sp 20 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + ^ lr: sp 20 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + ^

So, to obtain sp in previous stack frame (from lines (44f8 - 44f8+114)) it is necessary to read info from [sp + 52]. But I can't understand how the Valgrind works with DebugInfo and how to generate fake CFI info based on these rules. Could somebody help me with understanding the DiCfSI format for my case?

Vasily
Comment 19 Vasily 2012-12-19 15:07:39 UTC
Hello,

During my investigation I have found that it is possible to switch on this string 
"if (0/*DISABLED BY DEFAULT*/ && do_stack_scan && i < max_n_ips && i <= 2) {" 
in coregrind/m_stacktrace.c and obtain good back trace on ARM. Why this procedure is /*DISABLED BY DEFAULT*/ ?
Maybe it is possible to read in details about the unwinding performed by this procedure somewhere? Any links will be good.

Best regards,
Vasily Golubev
Comment 20 Vasily 2013-01-10 09:16:20 UTC
Created attachment 76362 [details]
Path for switching on manual unwinding

This patch adds two more parameters for Valgrind. Now it is possible to switch on manual unwinding. It can significantly improve stack trace if we have not debug info in binary.
Comment 21 Eero Tamminen 2013-04-29 15:42:22 UTC
(In reply to comment #19)
> Why this procedure is /*DISABLED BY DEFAULT*/ ?
> Maybe it is possible to read in details about the unwinding performed by
> this procedure somewhere? Any links will be good.

I took a quick look at that code in Valgrind 3.8.1:
1. first it uses debug info to unwind and only if that didn't given enough info, goes to commented out code you asked about...
2. If linker register points to a different function that the current address (based on symbol table information [1])
3. it checks whether it looks like it could be a valid return address, i.e. is it in executable part of memory, prefixed by a subroutine call [2] and not a duplicate of current address [3]
4. Then it checks every long aligned word [4] pointer in stack to see whether it looked like possible valid return address, until it runs out of current (4KB) stack memory page, or has reported max 5 pointers [5]

[1] Symbol table lists only global / exported function names -> this is 1st limit.  Checking is done by comparing *strings* returned by symbol lookup code elsewhere in Valgrind, but I guess that's peanuts due to need for symbol address lookup -> performance is 2nd limit.

[2] This needs different checks for different arm instructions, Thumb16, Thumb32, ARM32, Jazelle etc.  Not all of them are checked.  -> 3rd limit.

[3] With this check, I don't think it reports recursion -> 4th limit.  I think obvious optimization would be to do that check before function name resolving & string compare,  add function (to debuginfo.c) which returns just symbol addresses and use that to do compares with symbol addresses instead of their name strings.

[4] Hm. Why UWord, why not Addr?
---
         UWord w = *(UWord*)(sp & ~0x3);
         if (looks_like_RA(w)) {
---

[5] I.e. if there's lot of data in stack, or address happened to be close to stack border you may not get much.

All in all, these heuristics are fragile, what you get is pretty little, and based on *guesses*.

My assumption why it's disabled in Valgrind, is wrong information being worse than missing information.

E.g. older Gdb versions reported function names resolved with symbol tables, but because code can have a *huge* amount of non-exported functions between the exported symbol addresses, they can be *completely* misleading.  In newer versions Gdb doesn't anymore present symbol table based "guesses".
Comment 22 Julian Seward 2013-10-18 13:25:06 UTC
(In reply to comment #20)
> Created attachment 76362 [details]
> Path for switching on manual unwinding

Modified version of this patch committed as 13657; thanks for the patch.

Closing this now.  This has actually been tracking multiple separate
issues (excessive mem use for debuginfo, and EXIDX support) and it
would be better to open new bugs for those things separately, if
required.
Comment 23 Julian Seward 2014-09-02 21:40:35 UTC
EXIDX unwind support was added in r14217.