Bug 353192 - Debug info/data section not detected on AMD64
Summary: Debug info/data section not detected on AMD64
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.10 SVN
Platform: Ubuntu Linux
: NOR grave
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-25 19:20 UTC by Jack
Modified: 2025-08-22 08:38 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jack 2015-09-25 19:20:39 UTC
An ELF-binary with a single RWE LOAD section (as opposed to one RW and one RE) does not get its symbols loaded under Ubuntu 64.

It appears that this is due to the `is_rx_map` and `is_rw_map` checking in debuginfo.c. Only on x86, but not x86_64/AMD64 is an RWE section allowed.

I would propose bumping the AMD64 ifdef up to the same line as the x86 one.

Reproducible: Always

Steps to Reproduce:
1. Load ELF binary with a single RWE LOAD section with valgrind using `-v` flag.

Actual Results:  
"Reading syms from <program name>" never appears. It goes straight to shared library loading/symbol reading. No symbols appear in valgrind output file (with callgrind, for instance)

Expected Results:  
"Reading syms from <program name>" should be printed. Callgrind output should have proper symbols.

Because of this, there are no symbols throughout valgrind which renders the program mostly useless.
Comment 1 Tom Hughes 2015-09-25 21:05:28 UTC
Well AMD64 always supports NX so having a writable code section is bad news because it means you won't get any protection for it at run time. What sort of compiler/linker is producing such a thing?
Comment 2 Jack 2015-09-25 21:11:20 UTC
GCC 4.8.2

Here's the Program Header section of the binary which doesn't get symbols resolved in Valgrind. Perhaps I missed something:

"""
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000188 0x0000000000000188  R E    8
  INTERP         0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000206b0c1 0x00000000021098f8  RWE    200000
  DYNAMIC        0x0000000002062c00 0x0000000002462c00 0x0000000002462c00
                 0x0000000000000350 0x0000000000000350  RW     8
  NOTE           0x00000000000001e4 0x00000000004001e4 0x00000000004001e4
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000001dde2c0 0x00000000021de2c0 0x00000000021de2c0
                 0x000000000000c9c4 0x000000000000c9c4  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
"""
Comment 3 Jack 2015-09-25 21:11:52 UTC
Please let me know if there's more info I can provide.
Comment 4 Patrick Collins 2016-01-08 23:20:09 UTC
I was bitten by this as well. I ended up with program headers that looked like this:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000005eabf4 0x00000000005eabf4  R**W**E    200000
  LOAD           0x0000000000600000 0x0000000000800000 0x0000000000800000
                 0x000000000003b3f0 0x000000000004efe8  RW     200000
  DYNAMIC        0x000000000060f460 0x000000000080f460 0x000000000080f460
                 0x0000000000000180 0x0000000000000180  RW     8
  NOTE           0x00000000000001c8 0x00000000000001c8 0x00000000000001c8
                 0x0000000000000024 0x0000000000000024  R      4
  GNU_EH_FRAME   0x000000000057c050 0x000000000057c050 0x000000000057c050
                 0x000000000001600c 0x000000000001600c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x0000000000600000 0x0000000000800000 0x0000000000800000
                 0x0000000000011000 0x0000000000011000  R      1

where the W flag in section 00 was caused by a combination of unusual __attribute__((__section__(foo))) annotations in GCC. I am also on Ubuntu, with amd64. I think this is properly called a bug, at least on amd64, because there is nothing in any standard that prevents .text sections from being mmaped with rwx permissions. 


As far as I can tell, moving amd64 to the same bucket as x86 and accepting r.x permissions would be harmless, since Valgrind will only try to read in debug symbols for a particular section if it finds *both* a section marked as executable *and* a .debug entry that corresponds to that section. I assume the rationale here is that if users are trying to treat a writeable section as a text section, then they're probably doing something wrong --- but this won't change existing behavior unless the user also provides debug info corresponding to that section (in which case they probably really do want to treat that section as text-like).

At the very least, emitting a warning when --trace-symbtab is turned on would be helpful, because this was very difficult to track down.
Comment 5 Patrick Collins 2016-01-08 23:21:26 UTC
And if it's useful, I can put together a compileable example that displays this behavior. Please let me know.
Comment 6 Fredrik Tolf 2016-03-28 04:42:08 UTC
I also have this issue. The reason I have an executable data segment is because I create a new section that is writable/executable for patchable code:

> .pushsection .genfuns,\"awx\",@progbits;
> [...]
> .popsection

This causes the linker to make the entire data segment RWX. Regardless of the security implications, it seems Valgrind should be able to debug the file with symbol info.


Also, while debugging Valgrind to see why it didn't load my symbols, I also encountered what seemed to be unintentional behavior in discard_syms_in_range(). On a completely unrelated munmap() call, it discarded the DebugInfo for my executable because of how the in-range test is formulated. It currently looks like this:

>         if (curr->text_present
>             && curr->text_size > 0
>             && (start+length - 1 < curr->text_avma 
>                 || curr->text_avma + curr->text_size - 1 < start)) {
>            /* no overlap */
>         } else {
>            found = True;
>            break;
>         }

This way, `found' is set not only when the range overlaps, but also when there is no range. I don't know if there is any information elsewhere that makes this meaningful, but it seems to me that the test should look like this instead:

>         if (curr->text_present && curr->text_size > 0) {
>             if (start+length - 1 < curr->text_avma 
>                 || curr->text_avma + curr->text_size - 1 < start) {
>                 /* no overlap */
>             } else {
>                 found = True;
>                 break;
>             }
>         }

Technically, I guess this should perhaps be another report, but since it doesn't cause any problems in and of itself, I wasn't sure how to report it. :)
Comment 7 Ivo Raisr 2017-05-05 14:22:16 UTC
Patrick, could you please supply a reproducible test case as you offered?
Comment 8 Fredrik Tolf 2018-02-24 11:35:19 UTC
This is my reproducible testcase:

#include <stdlib.h>

asm(".pushsection .foo,\"awx\",@progbits;"
    ".type writeablefunction, @function;"
    "writeablefunction:"
    "ret;"
    ".popsection;");

int main(int argc, char **argv)
{
    malloc(128);
    return(0);
}

I compiled with "gcc -g -Wall -o vgtest vgtest.c", but I reckon it should be fairly tolerant with compiler flags. Valgrind output is:

$ valgrind --tool=memcheck --leak-check=full ./vgtest
==27841== Memcheck, a memory error detector
==27841== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27841== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==27841== Command: ./vgtest
==27841== 
==27841== 
==27841== HEAP SUMMARY:
==27841==     in use at exit: 128 bytes in 1 blocks
==27841==   total heap usage: 1 allocs, 0 frees, 128 bytes allocated
==27841== 
==27841== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1
==27841==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==27841==    by 0x1086C8: ??? (in /tmp/vgtest)
==27841==    by 0x4E582B0: (below main) (libc-start.c:291)
==27841== 
==27841== LEAK SUMMARY:
==27841==    definitely lost: 128 bytes in 1 blocks
==27841==    indirectly lost: 0 bytes in 0 blocks
==27841==      possibly lost: 0 bytes in 0 blocks
==27841==    still reachable: 0 bytes in 0 blocks
==27841==         suppressed: 0 bytes in 0 blocks
==27841== 
==27841== For counts of detected and suppressed errors, rerun with: -v
==27841== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

To point in this case being the missing symbol for "main" in the loss record.
Comment 9 Fredrik Tolf 2018-02-24 11:55:23 UTC
Also, this is a patch that fixes the issue for me. It does also include the fix I mentioned above.

--- valgrind-3.12.0~svn20160714.orig/coregrind/m_debuginfo/debuginfo.c
+++ valgrind-3.12.0~svn20160714/coregrind/m_debuginfo/debuginfo.c
@@ -359,14 +359,14 @@ static Bool discard_syms_in_range ( Addr
       while (True) {
          if (curr == NULL)
             break;
-         if (curr->text_present
-             && curr->text_size > 0
-             && (start+length - 1 < curr->text_avma 
-                 || curr->text_avma + curr->text_size - 1 < start)) {
-            /* no overlap */
-        } else {
-           found = True;
-           break;
+         if (curr->text_present && curr->text_size > 0) {
+           if (start+length - 1 < curr->text_avma 
+               || curr->text_avma + curr->text_size - 1 < start) {
+              /* no overlap */
+           } else {
+              found = True;
+              break;
+           }
         }
         curr = curr->next;
       }
@@ -944,10 +944,10 @@ ULong VG_(di_notify_mmap)( Addr a, Bool
    is_ro_map = False;
 
 #  if defined(VGA_x86) || defined(VGA_ppc32) || defined(VGA_mips32) \
-      || defined(VGA_mips64)
+      || defined(VGA_mips64) || defined(VGA_amd64)
    is_rx_map = seg->hasR && seg->hasX;
    is_rw_map = seg->hasR && seg->hasW;
-#  elif defined(VGA_amd64) || defined(VGA_ppc64be) || defined(VGA_ppc64le)  \
+#  elif defined(VGA_ppc64be) || defined(VGA_ppc64le)  \
         || defined(VGA_arm) || defined(VGA_arm64)
    is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
    is_rw_map = seg->hasR && seg->hasW && !seg->hasX;

This is against Debian's source tree, however. I hope that doesn't cause too much problem.
Comment 10 Fredrik Tolf 2025-06-15 14:00:46 UTC
This is still an issue, by the way.
Comment 11 Paul Floyd 2025-06-16 19:58:12 UTC
It runs OK on FreeBSD built with clang and lld. That gives 5 segments.

    LOAD off    0x0000000000000000 vaddr 0x0000000000200000 paddr 0x0000000000200000 align 2**12
         filesz 0x000000000000060c memsz 0x000000000000060c flags r--
    LOAD off    0x0000000000000610 vaddr 0x0000000000201610 paddr 0x0000000000201610 align 2**12
         filesz 0x0000000000000160 memsz 0x0000000000000160 flags r-x
    LOAD off    0x0000000000000770 vaddr 0x0000000000202770 paddr 0x0000000000202770 align 2**12
         filesz 0x0000000000000001 memsz 0x0000000000000001 flags rwx
    LOAD off    0x0000000000000778 vaddr 0x0000000000203778 paddr 0x0000000000203778 align 2**12
         filesz 0x0000000000000170 memsz 0x0000000000000888 flags rw-
    LOAD off    0x00000000000008e8 vaddr 0x00000000002048e8 paddr 0x00000000002048e8 align 2**12
         filesz 0x0000000000000038 memsz 0x0000000000000048 flags rw-
Comment 12 Fredrik Tolf 2025-06-17 01:58:53 UTC
Seems either clang or lld does something different, then. The problem for me on Debian is that  ld apparently merges the .data section with my .foo section into one RWX segment, and Valgrind (without patching) apparently doesn't consider a mapping without at least one pure RX and one pure RW section to be a program mapping.

If you ask me, I think Valgrind should recognize mappings to load debug symbols from by their having an ELF signature, rather than by some segment property heuristic, but maybe that has other weird consequences that I haven't foreseen.
Comment 13 Paul Floyd 2025-08-19 06:27:35 UTC
(In reply to Fredrik Tolf from comment #12)
> Seems either clang or lld does something different, then. The problem for me
> on Debian is that  ld apparently merges the .data section with my .foo
> section into one RWX segment, and Valgrind (without patching) apparently
> doesn't consider a mapping without at least one pure RX and one pure RW
> section to be a program mapping.
> 
> If you ask me, I think Valgrind should recognize mappings to load debug
> symbols from by their having an ELF signature, rather than by some segment
> property heuristic, but maybe that has other weird consequences that I
> haven't foreseen.

I agree. I've already changed the code for rw- segments to use the ELF program header number of segments rather than a hard counted value of 1.
We should do he same for r-x (and rwx).

There are two big headaches.
1. As you see in the code there is a lot of variation between platforms. I'm not sure if it's possible to have code that is genuinely generic and cross platform.
2. This code gets called both for binaries that are already loaded and whenever the guest loads a binary (at startup of via dlopen). The "already loaded" case is horrible as Valgrind may have merged segments that are contiguous.
Comment 14 Fredrik Tolf 2025-08-19 15:18:46 UTC
(In reply to Paul Floyd from comment #13)
> There are two big headaches.
> 1. As you see in the code there is a lot of variation between platforms. I'm
> not sure if it's possible to have code that is genuinely generic and cross
> platform.

I will certainly admit that I didn't study the code in extreme detail, so I might certainly misunderstand what you are writing about, but from what I saw, the variation between platforms seemed to have mainly consisted in the various different segment property heuristics for the different platforms, which is exactly what I suggest to remove entirely. If Valgrind simply checks whether a mapped file is an ELF file, should that not effectively remove the need for this platform variation?

> 2. This code gets called both for binaries that are already loaded and
> whenever the guest loads a binary (at startup of via dlopen). The "already
> loaded" case is horrible as Valgrind may have merged segments that are
> contiguous.

Again, to be clear, I'm speaking from almost complete ignorance here, so please tell me what I'm missing, but when I'm using an unpatched Valgrind to debug a program with this problem, and I want the symbol name for a certain address, I just check the process' memory mappings, calculate the file offset of the address, and then just use gdb on the file mapped at that address to ask for the symbol corresponding to that "virtual address" (with "info line *$address"). While somewhat laborious, this seems to work every time. Is there a reason Valgrind can't just do basically the same thing for any address whose page is mapped from an ELF file? That fundamentally seems like the approach I'd take if I were to build an address-to-symbol mapper. I guess what I'm saying is that I would think that all you really need is the file path and offset for any file-mapped page in the guest address space, and any "segment info" or other more abstract runtime information should be superfluous. Am I wrong?
Comment 15 Paul Floyd 2025-08-19 15:45:01 UTC
(In reply to Fredrik Tolf from comment #14)

> If Valgrind simply
> checks whether a mapped file is an ELF file, should that not effectively
> remove the need for this platform variation?

My concern is that some of the platform hard-codedness may be really necessary.

> Again, to be clear, I'm speaking from almost complete ignorance here, so
> please tell me what I'm missing, but when I'm using an unpatched Valgrind to
> debug a program with this problem, and I want the symbol name for a certain
> address, I just check the process' memory mappings, calculate the file
> offset of the address, and then just use gdb on the file mapped at that
> address to ask for the symbol corresponding to that "virtual address" (with
> "info line *$address"). While somewhat laborious, this seems to work every
> time. Is there a reason Valgrind can't just do basically the same thing for
> any address whose page is mapped from an ELF file? That fundamentally seems
> like the approach I'd take if I were to build an address-to-symbol mapper. I
> guess what I'm saying is that I would think that all you really need is the
> file path and offset for any file-mapped page in the guest address space,
> and any "segment info" or other more abstract runtime information should be
> superfluous. Am I wrong?

Valgrind has its own DWARF reader. We're currently looking at switching to GPLv3 which would allow us to use code from gdb and binutuls. The problem here is detecting the conditions that trigger looking in a segment for symbols. There's  more than just guest binary error callstacks. It's nice to be able to generate a proper callstack for Valgrind itself when "the impossible happens". We also need symbols for all of the redirection functions which use an encoding to indicate the function name and library that they should redirect.
Comment 16 Fredrik Tolf 2025-08-19 16:02:30 UTC
(In reply to Paul Floyd from comment #15)
> Valgrind has its own DWARF reader.

Certainly; I didn't mean to imply that Valgrind should actually call on gdb, I just used "info line *$address" as an example of fetching the symbol for a "file offset" address.

> The problem here is detecting the conditions that trigger 
> looking in a segment for symbols.

I'm sorry if I'm missing something again (not being a Valgrind developer, I haven't needed to delve deeply into the concrete debug-info formats and whatnot), but "looking in a segment for symbols" sounds needlessly complicated from my naïve perspective. My perspective is to look at it more as "finding the symbol for a file offset". Is that perspective not valid? You mention generating symbols for Valgrind itself as well; given that Valgrind is also mapped into the guest's address space, wouldn't that work for that case just as well?

> My concern is that some of the platform hard-codedness may be really
> necessary.

Interesting. Are we talking about embedded architectures, or something? What would be an example of a platform where you can't match a guest address to a file path and offset? Or am I misunderstanding the problem?
Comment 17 Paul Floyd 2025-08-19 16:32:30 UTC
(In reply to Fredrik Tolf from comment #16)
> (In reply to Paul Floyd from comment #15)
> > Valgrind has its own DWARF reader.
> 
> Certainly; I didn't mean to imply that Valgrind should actually call on gdb,
> I just used "info line *$address" as an example of fetching the symbol for a
> "file offset" address.
> 
> > The problem here is detecting the conditions that trigger 
> > looking in a segment for symbols.
> 
> I'm sorry if I'm missing something again (not being a Valgrind developer, I
> haven't needed to delve deeply into the concrete debug-info formats and
> whatnot), but "looking in a segment for symbols" sounds needlessly
> complicated from my naïve perspective. My perspective is to look at it more
> as "finding the symbol for a file offset". Is that perspective not valid?
> You mention generating symbols for Valgrind itself as well; given that
> Valgrind is also mapped into the guest's address space, wouldn't that work
> for that case just as well?

We need to look for redirects when an executable segment is loaded. At the same time we also record the memory range and the associated ELF file. I don't remember if we do anything with the DWARF segement at that time, I suspect not.

It's the other way around - the guest is mapped into Valgrind's address space. Valgrind mmap's the guest binary and ld.so.

> > My concern is that some of the platform hard-codedness may be really
> > necessary.
> 
> Interesting. Are we talking about embedded architectures, or something?

No, just that every architecture seems to be different with many special cases.

> What
> would be an example of a platform where you can't match a guest address to a
> file path and offset? Or am I misunderstanding the problem?

That is what we do, but only when we have decided that the memory was loaded from some variation of rw- or r-x segments. And that can be quite tricky, especially for the tool binary.
Comment 18 Fredrik Tolf 2025-08-20 00:27:23 UTC
(In reply to Paul Floyd from comment #17)
> but only when we have decided that the memory was loaded
> from some variation of rw- or r-x segments. And that can be quite tricky,
> especially for the tool binary.

I don't mean to turn this thread into my education on Valgrind internals, but what is the reason for that restriction? Why not just load symbols from all ELF files, regardless of their segment properties?
Comment 19 Paul Floyd 2025-08-20 08:57:10 UTC
We want to avoid being too flexible and loading/executing binaries that the OS would reject. On FreeBSD "WX" segments are not allowed by default. So if I do make a binary with an rwx segment it should fail to run. Unless the kern.elf[32|64].allow_wx sysctl is set to 1 or the binary is tagged with the NT_FREEBSD_FCTL_WXNEEDED note. Complicating matters further, Valgrind itself needs to allow WX. From the comments in the code the situation is the similar for Linux s390.
Comment 20 Fredrik Tolf 2025-08-21 20:51:29 UTC
(In reply to Paul Floyd from comment #19)
> We want to avoid being too flexible and loading/executing binaries that the
> OS would reject. On FreeBSD "WX" segments are not allowed by default.
That sounds like an issue for the mmap() implementation and/or the ELF loader, doesn't it? If an ELF file *does* successfully get mapped, is there a reason not to always load symbols from it?
Comment 21 Paul Floyd 2025-08-22 04:18:58 UTC
(In reply to Fredrik Tolf from comment #20)
> (In reply to Paul Floyd from comment #19)
> > We want to avoid being too flexible and loading/executing binaries that the
> > OS would reject. On FreeBSD "WX" segments are not allowed by default.
> That sounds like an issue for the mmap() implementation and/or the ELF
> loader, doesn't it? If an ELF file *does* successfully get mapped, is there
> a reason not to always load symbols from it?

Not at all. This is a kernel security feature. Allowing both W and X means that you are allowing running code to be modified.

I need to check, but as far as I know selinux has a similar feature. I need to look to see if AppArmor also supports W^X controls.
Comment 22 Paul Floyd 2025-08-22 08:38:04 UTC
Also see

https://en.wikipedia.org/wiki/W%5EX