Summary: | Reading debug info of binaries with readonly PT_LOAD segments | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Дилян Палаузов <dilyan.palauzov> |
Component: | general | Assignee: | Mark Wielaard <mark> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | fweimer, hjl.tools, l.lunak, mark, orivej, tom |
Priority: | NOR | ||
Version: | 3.14 SVN | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Bug Depends on: | |||
Bug Blocks: | 396476 | ||
Attachments: |
The produced binary
Accept read-only PT_LOAD segments and .rodata by ld -z separate-code A test |
Description
Дилян Палаузов
2018-06-21 09:28:05 UTC
It turns out that when linking with ld.bfd 2.30 or gold 2.31.51.20180630) 1.16 valgrind can read the debug information, but not when ld.bfd 2.31.51.20180630 is used. I described at https://sourceware.org/bugzilla/show_bug.cgi?id=23357 the whole case, and uploaded there the produced binaries. Please read it, I don't want to repeat here the text from there in order keep the total text for the case shorter. While I filled a PR for ld.bfd, I don't state whether the problem is in ld.bfd or in valgrind. For the time being programs can either be linked explicitly with gold: gcc -fuse-ld=gold or switch off the implicitly enabled separate-code on Linux/x86: gcc -fuse-ld=bfd -Wl,-z,noseparate-code or change the default linker by replacing '/usr/local/x86_64-pc-linux-gnu/bin/ld' (the path can be different on your system), which is a copy of /usr/local/x86_64-pc-linux-gnu/bin/ld.bfd, with /usr/local/x86_64-pc-linux-gnu/bin/ld.gold . Here is a tiny program: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero Valgrind can't read its DWARF debug info: [hjl@gnu-cfl-1 simple-linux]$ make LD=ld.gold gcc -g -O0 -c -o test.o test.c test.c: In function \u2018main\u2019: test.c:22:9: warning: division by zero [-Wdiv-by-zero] i = 7 / 0; ^ gcc -g -c -o start.o start.S gcc -g -c -o syscall.o syscall.S ld.gold -o test test.o start.o syscall.o ./test hello world a make: *** [Makefile:12: all] Floating point exception [hjl@gnu-cfl-1 simple-linux]$ valgrind ./test ==27555== Memcheck, a memory error detector ==27555== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==27555== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==27555== Command: ./test ==27555== a ==27555== ==27555== Process terminating with default action of signal 8 (SIGFPE) ==27555== Integer divide by zero at address 0x1002B95532 ==27555== at 0x40015E: ??? (in /export/ssd/git/github/simple-linux/test) ==27555== by 0x4001A8: ??? (in /export/ssd/git/github/simple-linux/test) ==27555== ==27555== HEAP SUMMARY: ==27555== in use at exit: 0 bytes in 0 blocks ==27555== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==27555== ==27555== All heap blocks were freed -- no leaks are possible ==27555== ==27555== For counts of detected and suppressed errors, rerun with: -v ==27555== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Floating point exception [hjl@gnu-cfl-1 simple-linux]$ gdb test GNU gdb (GDB) Fedora 8.1-19.fc28 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from test...done. (gdb) r Starting program: /export/ssd/git/github/simple-linux/test a Program received signal SIGFPE, Arithmetic exception. 0x000000000040015c in main (argc=1, argv=0x7fffffffd708) at test.c:22 22 i = 7 / 0; (gdb) The root cause is that ld -z separate-code introduces various new PT_LOAD segments and the code in valgrind interpreting the PT_LOAD mappings is very specific about what is does and what it doesn't consider a code or data mapping to match symbols and debuginfo against. There are various bugs for this: fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1600034 debian: http://bugs.debian.org/903389 binutils: https://sourceware.org/bugzilla/show_bug.cgi?id=23357 The following valgrind bug is also somewhat related: https://bugs.kde.org/show_bug.cgi?id=390871 "ELF debug info reader confused with multiple .rodata* sections" The issue can most easily be seen using --trace-symtab=yes which will show the PT_LOAD program headers and flags. There are two places in the code which interpret the mappings/flags and try to map things to debuginfo. coregrind/m_debuginfo/debuginfo.c (di_notify_mmap) and coregrind/m_debuginfo/readelf.c (read_elf_debug_info) Note in the first that it doesn't set up is_ro_map unless compiled for darwin. Note in the second that it seems to only consider the first PT_LOAD segment and bails out when that doesn't match. (In reply to Mark Wielaard from comment #4) > The root cause is that ld -z separate-code introduces various new PT_LOAD This is very misleading. This simple test: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero doesn't use -z separate-code: [hjl@gnu-cfl-1 simple-linux]$ readelf -lW test Elf file type is EXEC (Executable file) Entry point 0x40019a There are 3 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000240 0x000240 R E 0x1000 LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x000000 0x000000 RW 0x1000 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RWE 0x10 Section to Segment mapping: Segment Sections... 00 .text .rodata .eh_frame 01 .data .bss 02 [hjl@gnu-cfl-1 simple-linux]$ (In reply to H.J. Lu from comment #5) > (In reply to Mark Wielaard from comment #4) > > The root cause is that ld -z separate-code introduces various new PT_LOAD > > This is very misleading. I don't know why you say that, or why you removed the rest of the sentence "... and the code in valgrind interpreting the PT_LOAD mappings is very specific about what it does and what it doesn't consider a code or data mapping to match symbols and debuginfo against." The point is simply that if people want to work around it the simplest thing to do for now is to use ld -z no-separate-code, which is what fedora is doing for now till we get valgrind fixed. I didn't say that it is the only way that valgrind might get confused by different PT_LOAD segment setups. In fact if you look at the code pointed out in comment #4 you'll see that there are lots of ways to confuse valgrind. And that the code is horribly architecture and OS specific. It just doesn't work with -z separate-code at the moment. We have to fix that (and hopefully fix other things while we do it). But for now just don't use -z separate-code or build binutils with --disable-separate-code. The problem is with readonly PT_LOAD segments. Please try users/hjl/pr395682/master branch at https://github.com/hjl-tools/valgrind/tree/users/hjl/pr395682/master That looks very similar to my version at https://github.com/tomhughes/valgrind/commit/40a30ca68769c0825e078b731cd115849b1a6744 but I think you've got something a bit extra cope with multiple ro segments? Created attachment 113898 [details]
Accept read-only PT_LOAD segments and .rodata by ld -z separate-code
I combined Tom's fix for separate read-only segments with the .rodata reading fix from H.J., but changed it to always use the bias from the loaded range. I wasn't comfortable with all the changes in the asserts especially because they would contradict the comments directly before them. And they seemed unnecessary.
This patch fixes the issue with the reported binary in this bug and with the i386 glibc ld.so created on fedora (when build with ld -z separate-code).
(In reply to Mark Wielaard from comment #9) > Created attachment 113898 [details] > Accept read-only PT_LOAD segments and .rodata by ld -z separate-code > > I combined Tom's fix for separate read-only segments with the .rodata > reading fix from H.J., but changed it to always use the bias from the loaded > range. I wasn't comfortable with all the changes in the asserts especially > because they would contradict the comments directly before them. And they > seemed unnecessary. > > This patch fixes the issue with the reported binary in this bug and with the > i386 glibc ld.so created on fedora (when build with ld -z separate-code). It doesn't fix: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero Created attachment 113899 [details]
A test
With the proposed fix:
[hjl@gnu-cfl-1 build-x86_64-linux]$ ./.in_place/memcheck-amd64-linux /export/gnu/import/git/github/simple-linux/test
==30545== Memcheck, a memory error detector
==30545== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30545== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==30545== Command: /export/gnu/import/git/github/simple-linux/test
==30545==
a
==30545==
==30545== Process terminating with default action of signal 8 (SIGFPE)
==30545== Integer divide by zero at address 0x100387C52A
==30545== at 0x40015E: ??? (in /export/ssd/git/github/simple-linux/test)
==30545== by 0x4001A8: ??? (in /export/ssd/git/github/simple-linux/test)
==30545==
==30545== HEAP SUMMARY:
==30545== in use at exit: 0 bytes in 0 blocks
==30545== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==30545==
==30545== All heap blocks were freed -- no leaks are possible
==30545==
==30545== For counts of detected and suppressed errors, rerun with: -v
==30545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Floating point exception
With my fixes:
[hjl@gnu-cfl-1 build-x86_64-linux]$ ./.in_place/memcheck-amd64-linux /export/gnu/import/git/github/simple-linux/test
==31384== Memcheck, a memory error detector
==31384== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31384== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==31384== Command: /export/gnu/import/git/github/simple-linux/test
==31384==
a
==31384==
==31384== Process terminating with default action of signal 8 (SIGFPE)
==31384== Integer divide by zero at address 0x100388C52A
==31384== at 0x40015E: main (test.c:22)
==31384==
==31384== HEAP SUMMARY:
==31384== in use at exit: 0 bytes in 0 blocks
==31384== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==31384==
==31384== All heap blocks were freed -- no leaks are possible
==31384==
==31384== For counts of detected and suppressed errors, rerun with: -v
==31384== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Floating point exception
[hjl@gnu-cfl-1 build-x86_64-linux]$
(In reply to H.J. Lu from comment #10) > (In reply to Mark Wielaard from comment #9) > > This patch fixes the issue with the reported binary in this bug and with the > > i386 glibc ld.so created on fedora (when build with ld -z separate-code). > > It doesn't fix: > > https://github.com/hjl-tools/simple-linux/tree/divide-by-zero yes, you are right, but isn't that a totally different case? Your example seems to have everything in a single rx segment and then a zero size rw segment. Is that something that is specific to they way you created that testcase without actually linking against any library, or does that happen normally in some configurations? I think it is best to create a new bug for this case and post the specific patch that solves this issue in that bug. Even though your patches might be totally correct and help fix this issue I found it hard to reason about them because some just seemed to disable various asserts without updating the comments that explained why we needed those asserts (if the comments are wrong, please update them together with the code changes). (In reply to Mark Wielaard from comment #12) > yes, you are right, but isn't that a totally different case? Your example > seems to have everything in a single rx segment and then a zero size rw > segment. > > Is that something that is specific to they way you created that testcase > without actually linking against any library, or does that happen normally > in some configurations? Gold linker generates it. > I think it is best to create a new bug for this case and post the specific > patch that solves this issue in that bug. > I opened: https://bugs.kde.org/show_bug.cgi?id=396476 *** Bug 384727 has been marked as a duplicate of this bug. *** commit 64aa729bfae71561505a40c12755bd6b55bb3061 Author: Mark Wielaard <mark@klomp.org> Date: Thu Jul 12 13:56:00 2018 +0200 Accept read-only PT_LOAD segments and .rodata. The new binutils ld -z separate-code option creates multiple read-only PT_LOAD segments and might place .rodata in a non-executable segment. Allow and keep track of separate read-only segments and allow a readonly page with .rodata section. Based on patches from Tom Hughes <tom@compton.nu> and H.J. Lu <hjl.tools@gmail.com>. https://bugs.kde.org/show_bug.cgi?id=395682 |