Created attachment 113482 [details] The produced binary I have this program: #include <stdio.h> int main() { printf("a\n"); int i = 7 /0; printf("b\n"); } which I compile with "gcc -g t.c -o t" Running `valgrind -v --memcheck:track-origins=yes --read-var-info=yes --memcheck:show-leak-kinds=all --vgdb=no t` I expect to see the line where there are problems, but it prints: ==24781== Memcheck, a memory error detector ==24781== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==24781== Using Valgrind-3.14.0.GIT-90daa486e8-20180620X and LibVEX; rerun with -h for copyright info ==24781== Command: t ==24781== --24781-- Valgrind options: --24781-- -v --24781-- --memcheck:track-origins=yes --24781-- --read-var-info=yes --24781-- --memcheck:show-leak-kinds=all --24781-- --vgdb=no --24781-- Contents of /proc/version: --24781-- Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) --24781-- --24781-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-sse3 --24781-- Page sizes: currently 4096, max supported 4096 --24781-- Valgrind library directory: /usr/local/lib/valgrind --24781-- Reading syms from /home/me/t --24781-- ELF section outside all mapped regions --24781-- Reading syms from /lib/x86_64-linux-gnu/ld-2.19.so --24781-- Considering /lib/x86_64-linux-gnu/ld-2.19.so .. --24781-- .. CRC mismatch (computed c067370a wanted 8c45d3ea) --24781-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.19.so .. --24781-- .. CRC is valid --24781-- warning: addVar: unknown size (buf) --24781-- warning: addVar: unknown size (buf) --24781-- warning: addVar: unknown size (buf) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- warning: addVar: unknown size (loadcmds) --24781-- Reading syms from /usr/local/lib/valgrind/memcheck-amd64-linux --24781-- object doesn't have a symbol table --24781-- object doesn't have a dynamic symbol table --24781-- Scheduler: using generic scheduler lock implementation. --24781-- Reading suppressions file: /usr/local/lib/valgrind/default.supp --24781-- REDIR: 0x4017b50 (ld-linux-x86-64.so.2:strlen) redirected to 0x581df42e (???) --24781-- REDIR: 0x4017900 (ld-linux-x86-64.so.2:index) redirected to 0x581df448 (???) --24781-- Reading syms from /usr/local/lib/valgrind/vgpreload_core-amd64-linux.so --24781-- object doesn't have a symbol table --24781-- Reading syms from /usr/local/lib/valgrind/vgpreload_memcheck-amd64-linux.so --24781-- object doesn't have a symbol table ==24781== WARNING: new redirection conflicts with existing -- ignoring it --24781-- old: 0x04017b50 (strlen ) R-> (0000.0) 0x581df42e ??? --24781-- new: 0x04017b50 (strlen ) R-> (2007.0) 0x0402d490 strlen --24781-- REDIR: 0x4017b20 (ld-linux-x86-64.so.2:strcmp) redirected to 0x402ed40 (strcmp) --24781-- REDIR: 0x4018850 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4037390 (mempcpy) --24781-- Reading syms from /lib/x86_64-linux-gnu/libc-2.19.so --24781-- Considering /lib/x86_64-linux-gnu/libc-2.19.so .. --24781-- .. CRC mismatch (computed 8b555f82 wanted cd9b3228) --24781-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.19.so .. --24781-- .. CRC is valid --24781-- REDIR: 0x4aa8dc0 (libc.so.6:strcasecmp) redirected to 0x4023720 (_vgnU_ifunc_wrapper) --24781-- REDIR: 0x4aab0b0 (libc.so.6:strncasecmp) redirected to 0x4023720 (_vgnU_ifunc_wrapper) --24781-- REDIR: 0x4aa8590 (libc.so.6:memcpy@GLIBC_2.2.5) redirected to 0x4023720 (_vgnU_ifunc_wrapper) --24781-- REDIR: 0x4aa6910 (libc.so.6:rindex) redirected to 0x402ce10 (rindex) --24781-- REDIR: 0x4aa4c10 (libc.so.6:strlen) redirected to 0x402d3d0 (strlen) ==24781== ==24781== Process terminating with default action of signal 8 (SIGFPE) ==24781== Integer divide by zero at address 0x10090584FE ==24781== at 0x401149: ??? (in /home/me/t) ==24781== by 0x4A44B44: (below main) (libc-start.c:287) a --24781-- REDIR: 0x4a9f600 (libc.so.6:free) redirected to 0x402afd0 (free) ==24781== ==24781== HEAP SUMMARY: ==24781== in use at exit: 0 bytes in 0 blocks ==24781== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==24781== ==24781== All heap blocks were freed -- no leaks are possible ==24781== ==24781== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==24781== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Why isn't the code line shown at ==24781== Integer divide by zero at address 0x10090584FE ==24781== at 0x401149: ??? (in /home/me/t) How shall I compile in order to see the lines in valgrind's output? It worked in the past and I don't know when and why it stopped working. I use gcc (GCC) 7.3.1 20180618 and valgrind-3.14.0.GIT-90daa486e8-20180620X.
It turns out that when linking with ld.bfd 2.30 or gold 2.31.51.20180630) 1.16 valgrind can read the debug information, but not when ld.bfd 2.31.51.20180630 is used. I described at https://sourceware.org/bugzilla/show_bug.cgi?id=23357 the whole case, and uploaded there the produced binaries. Please read it, I don't want to repeat here the text from there in order keep the total text for the case shorter. While I filled a PR for ld.bfd, I don't state whether the problem is in ld.bfd or in valgrind.
For the time being programs can either be linked explicitly with gold: gcc -fuse-ld=gold or switch off the implicitly enabled separate-code on Linux/x86: gcc -fuse-ld=bfd -Wl,-z,noseparate-code or change the default linker by replacing '/usr/local/x86_64-pc-linux-gnu/bin/ld' (the path can be different on your system), which is a copy of /usr/local/x86_64-pc-linux-gnu/bin/ld.bfd, with /usr/local/x86_64-pc-linux-gnu/bin/ld.gold .
Here is a tiny program: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero Valgrind can't read its DWARF debug info: [hjl@gnu-cfl-1 simple-linux]$ make LD=ld.gold gcc -g -O0 -c -o test.o test.c test.c: In function \u2018main\u2019: test.c:22:9: warning: division by zero [-Wdiv-by-zero] i = 7 / 0; ^ gcc -g -c -o start.o start.S gcc -g -c -o syscall.o syscall.S ld.gold -o test test.o start.o syscall.o ./test hello world a make: *** [Makefile:12: all] Floating point exception [hjl@gnu-cfl-1 simple-linux]$ valgrind ./test ==27555== Memcheck, a memory error detector ==27555== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==27555== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==27555== Command: ./test ==27555== a ==27555== ==27555== Process terminating with default action of signal 8 (SIGFPE) ==27555== Integer divide by zero at address 0x1002B95532 ==27555== at 0x40015E: ??? (in /export/ssd/git/github/simple-linux/test) ==27555== by 0x4001A8: ??? (in /export/ssd/git/github/simple-linux/test) ==27555== ==27555== HEAP SUMMARY: ==27555== in use at exit: 0 bytes in 0 blocks ==27555== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==27555== ==27555== All heap blocks were freed -- no leaks are possible ==27555== ==27555== For counts of detected and suppressed errors, rerun with: -v ==27555== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Floating point exception [hjl@gnu-cfl-1 simple-linux]$ gdb test GNU gdb (GDB) Fedora 8.1-19.fc28 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from test...done. (gdb) r Starting program: /export/ssd/git/github/simple-linux/test a Program received signal SIGFPE, Arithmetic exception. 0x000000000040015c in main (argc=1, argv=0x7fffffffd708) at test.c:22 22 i = 7 / 0; (gdb)
The root cause is that ld -z separate-code introduces various new PT_LOAD segments and the code in valgrind interpreting the PT_LOAD mappings is very specific about what is does and what it doesn't consider a code or data mapping to match symbols and debuginfo against. There are various bugs for this: fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1600034 debian: http://bugs.debian.org/903389 binutils: https://sourceware.org/bugzilla/show_bug.cgi?id=23357 The following valgrind bug is also somewhat related: https://bugs.kde.org/show_bug.cgi?id=390871 "ELF debug info reader confused with multiple .rodata* sections" The issue can most easily be seen using --trace-symtab=yes which will show the PT_LOAD program headers and flags. There are two places in the code which interpret the mappings/flags and try to map things to debuginfo. coregrind/m_debuginfo/debuginfo.c (di_notify_mmap) and coregrind/m_debuginfo/readelf.c (read_elf_debug_info) Note in the first that it doesn't set up is_ro_map unless compiled for darwin. Note in the second that it seems to only consider the first PT_LOAD segment and bails out when that doesn't match.
(In reply to Mark Wielaard from comment #4) > The root cause is that ld -z separate-code introduces various new PT_LOAD This is very misleading. This simple test: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero doesn't use -z separate-code: [hjl@gnu-cfl-1 simple-linux]$ readelf -lW test Elf file type is EXEC (Executable file) Entry point 0x40019a There are 3 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000240 0x000240 R E 0x1000 LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x000000 0x000000 RW 0x1000 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RWE 0x10 Section to Segment mapping: Segment Sections... 00 .text .rodata .eh_frame 01 .data .bss 02 [hjl@gnu-cfl-1 simple-linux]$
(In reply to H.J. Lu from comment #5) > (In reply to Mark Wielaard from comment #4) > > The root cause is that ld -z separate-code introduces various new PT_LOAD > > This is very misleading. I don't know why you say that, or why you removed the rest of the sentence "... and the code in valgrind interpreting the PT_LOAD mappings is very specific about what it does and what it doesn't consider a code or data mapping to match symbols and debuginfo against." The point is simply that if people want to work around it the simplest thing to do for now is to use ld -z no-separate-code, which is what fedora is doing for now till we get valgrind fixed. I didn't say that it is the only way that valgrind might get confused by different PT_LOAD segment setups. In fact if you look at the code pointed out in comment #4 you'll see that there are lots of ways to confuse valgrind. And that the code is horribly architecture and OS specific. It just doesn't work with -z separate-code at the moment. We have to fix that (and hopefully fix other things while we do it). But for now just don't use -z separate-code or build binutils with --disable-separate-code.
The problem is with readonly PT_LOAD segments. Please try users/hjl/pr395682/master branch at https://github.com/hjl-tools/valgrind/tree/users/hjl/pr395682/master
That looks very similar to my version at https://github.com/tomhughes/valgrind/commit/40a30ca68769c0825e078b731cd115849b1a6744 but I think you've got something a bit extra cope with multiple ro segments?
Created attachment 113898 [details] Accept read-only PT_LOAD segments and .rodata by ld -z separate-code I combined Tom's fix for separate read-only segments with the .rodata reading fix from H.J., but changed it to always use the bias from the loaded range. I wasn't comfortable with all the changes in the asserts especially because they would contradict the comments directly before them. And they seemed unnecessary. This patch fixes the issue with the reported binary in this bug and with the i386 glibc ld.so created on fedora (when build with ld -z separate-code).
(In reply to Mark Wielaard from comment #9) > Created attachment 113898 [details] > Accept read-only PT_LOAD segments and .rodata by ld -z separate-code > > I combined Tom's fix for separate read-only segments with the .rodata > reading fix from H.J., but changed it to always use the bias from the loaded > range. I wasn't comfortable with all the changes in the asserts especially > because they would contradict the comments directly before them. And they > seemed unnecessary. > > This patch fixes the issue with the reported binary in this bug and with the > i386 glibc ld.so created on fedora (when build with ld -z separate-code). It doesn't fix: https://github.com/hjl-tools/simple-linux/tree/divide-by-zero
Created attachment 113899 [details] A test With the proposed fix: [hjl@gnu-cfl-1 build-x86_64-linux]$ ./.in_place/memcheck-amd64-linux /export/gnu/import/git/github/simple-linux/test ==30545== Memcheck, a memory error detector ==30545== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==30545== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==30545== Command: /export/gnu/import/git/github/simple-linux/test ==30545== a ==30545== ==30545== Process terminating with default action of signal 8 (SIGFPE) ==30545== Integer divide by zero at address 0x100387C52A ==30545== at 0x40015E: ??? (in /export/ssd/git/github/simple-linux/test) ==30545== by 0x4001A8: ??? (in /export/ssd/git/github/simple-linux/test) ==30545== ==30545== HEAP SUMMARY: ==30545== in use at exit: 0 bytes in 0 blocks ==30545== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==30545== ==30545== All heap blocks were freed -- no leaks are possible ==30545== ==30545== For counts of detected and suppressed errors, rerun with: -v ==30545== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Floating point exception With my fixes: [hjl@gnu-cfl-1 build-x86_64-linux]$ ./.in_place/memcheck-amd64-linux /export/gnu/import/git/github/simple-linux/test ==31384== Memcheck, a memory error detector ==31384== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==31384== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==31384== Command: /export/gnu/import/git/github/simple-linux/test ==31384== a ==31384== ==31384== Process terminating with default action of signal 8 (SIGFPE) ==31384== Integer divide by zero at address 0x100388C52A ==31384== at 0x40015E: main (test.c:22) ==31384== ==31384== HEAP SUMMARY: ==31384== in use at exit: 0 bytes in 0 blocks ==31384== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==31384== ==31384== All heap blocks were freed -- no leaks are possible ==31384== ==31384== For counts of detected and suppressed errors, rerun with: -v ==31384== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Floating point exception [hjl@gnu-cfl-1 build-x86_64-linux]$
(In reply to H.J. Lu from comment #10) > (In reply to Mark Wielaard from comment #9) > > This patch fixes the issue with the reported binary in this bug and with the > > i386 glibc ld.so created on fedora (when build with ld -z separate-code). > > It doesn't fix: > > https://github.com/hjl-tools/simple-linux/tree/divide-by-zero yes, you are right, but isn't that a totally different case? Your example seems to have everything in a single rx segment and then a zero size rw segment. Is that something that is specific to they way you created that testcase without actually linking against any library, or does that happen normally in some configurations? I think it is best to create a new bug for this case and post the specific patch that solves this issue in that bug. Even though your patches might be totally correct and help fix this issue I found it hard to reason about them because some just seemed to disable various asserts without updating the comments that explained why we needed those asserts (if the comments are wrong, please update them together with the code changes).
(In reply to Mark Wielaard from comment #12) > yes, you are right, but isn't that a totally different case? Your example > seems to have everything in a single rx segment and then a zero size rw > segment. > > Is that something that is specific to they way you created that testcase > without actually linking against any library, or does that happen normally > in some configurations? Gold linker generates it. > I think it is best to create a new bug for this case and post the specific > patch that solves this issue in that bug. > I opened: https://bugs.kde.org/show_bug.cgi?id=396476
*** Bug 384727 has been marked as a duplicate of this bug. ***
commit 64aa729bfae71561505a40c12755bd6b55bb3061 Author: Mark Wielaard <mark@klomp.org> Date: Thu Jul 12 13:56:00 2018 +0200 Accept read-only PT_LOAD segments and .rodata. The new binutils ld -z separate-code option creates multiple read-only PT_LOAD segments and might place .rodata in a non-executable segment. Allow and keep track of separate read-only segments and allow a readonly page with .rodata section. Based on patches from Tom Hughes <tom@compton.nu> and H.J. Lu <hjl.tools@gmail.com>. https://bugs.kde.org/show_bug.cgi?id=395682