Bug 445235 - Java/Ada/D demangling is probably broken
Summary: Java/Ada/D demangling is probably broken
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on: 445184
Blocks:
  Show dependency treegraph
 
Reported: 2021-11-09 22:52 UTC by Nick Nethercote
Modified: 2024-10-13 11:39 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch to activate caling the ada demangler (3.19 KB, patch)
2024-10-12 11:33 UTC, Paul Floyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Nethercote 2021-11-09 22:52:43 UTC
+++ This bug was initially created as a clone of Bug #445184 +++

Valgrind has code to demangle symbols for: C++, Rust, Java, Ada, and D. But the Java/Ada/D demangling is probably broken because the code is unreachable.

`VG_(demangle)` has an initial test that requires that the first two chars be "_Z" or "_R", which excludes a bunch of cases. Also, the code in `ML_(cplus_demangle)` that calls the demanglers for Java/Ada/D is a bit different to that for C++ and Rust (in the way the `*_DEMANGLING` constants are used), which may also be a problem.

From a quick look at the code, I think the following is true.
- C++ symbols usually start with "_Z", but "_GLOBAL_" is also possible, which won't work.
- Rust symbols start with "_Z" or "_R".
- Java symbols are a subset of C++ symbols, so should be ok, modulo possible "_GLOBAL_" cases, and possible AUTO_DEMANGLING problems.
- D symbols start with "_D", and so will be broken.
- Ada symbols can start with "_ada_", or with a lower-case letter, and so will be broken.
Comment 1 Paul Floyd 2023-01-29 21:09:38 UTC
The D demangling should be fixed (see https://bugs.kde.org/show_bug.cgi?id=464969)
Comment 2 Sam James 2024-10-03 00:19:03 UTC
Indeed, I don't think the Ada code works: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116945 (note the unmangled output).
Comment 3 Sam James 2024-10-03 00:19:13 UTC
mangled*
Comment 4 Mark Wielaard 2024-10-03 08:39:36 UTC
I can see how we missed the Ada case. The only "demangling" going on (in this example) is turning __ into .
We should probably add a testcase like memcheck/tests/bug464969_d_demangle (where we just encode an Ada mangled name as C function).
Comment 5 laguest@archeia.com 2024-10-09 09:34:18 UTC
@paul

```
procedure Leak_New is
   type IP is access to Integer;

   I : IP := new Integer;
begin
   for I in 1 .. 10 loop
      I := new Integer;
   end for;
end Leak_New;
```
Comment 6 laguest@archeia.com 2024-10-09 09:42:47 UTC
er...

procedure Leak_New is
   type IP is access Integer;

   I : IP := new Integer;
begin
   for C in 1 .. 10 loop
      I := new Integer;
   end loop;
end Leak_New;

objdump -t leak_new
Comment 7 Paul Floyd 2024-10-10 06:02:08 UTC
I installed gnat and I get, without any mods

==48426== 4 bytes in 1 blocks are definitely lost in loss record 1 of 2
==48426==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==48426==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
==48426==    by 0x402CE9: _ada_leak_new (leak_new.adb:4)
==48426==    by 0x402CA4: main (b~leak_new.adb:194)
==48426== 
==48426== 40 bytes in 10 blocks are definitely lost in loss record 2 of 2
==48426==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==48426==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
==48426==    by 0x402D04: _ada_leak_new (leak_new.adb:7)
==48426==    by 0x402CA4: main (b~leak_new.adb:194)

and if I make a small change to detect the "_ada_" prefix and then call ada_demangle  I get

==68758== 4 bytes in 1 blocks are definitely lost in loss record 1 of 2
==68758==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==68758==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
==68758==    by 0x402CE9: leak_new (leak_new.adb:4)
==68758==    by 0x402CA4: main (b~leak_new.adb:194)
==68758== 
==68758== 40 bytes in 10 blocks are definitely lost in loss record 2 of 2
==68758==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
==68758==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
==68758==    by 0x402D04: leak_new (leak_new.adb:7)
==68758==    by 0x402CA4: main (b~leak_new.adb:194)
Comment 8 Philippe Waroquiers 2024-10-10 18:57:10 UTC
(In reply to Paul Floyd from comment #7)
> I installed gnat and I get, without any mods
> 
> ==48426== 4 bytes in 1 blocks are definitely lost in loss record 1 of 2
> ==48426==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
> ==48426==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
> ==48426==    by 0x402CE9: _ada_leak_new (leak_new.adb:4)
> ==48426==    by 0x402CA4: main (b~leak_new.adb:194)
> ==48426== 
> ==48426== 40 bytes in 10 blocks are definitely lost in loss record 2 of 2
> ==48426==    at 0x484D2E4: malloc (vg_replace_malloc.c:450)
> ==48426==    by 0x4090A6: __gnat_malloc (s-memory.adb:79)
> ==48426==    by 0x402D04: _ada_leak_new (leak_new.adb:7)
> ==48426==    by 0x402CA4: main (b~leak_new.adb:194)
> 
> and if I make a small change to detect the "_ada_" prefix and then call
> ada_demangle  I get

Note that that _ada_ prefix is added by the compiler to the "top level" procedures (that could at least potentially be a "main" procedure).
If you e.g. write a package something.ads such as:
package Something is
   procedure Some_Procedure;
end Something;

with its body something.adb being:
 with Text_IO; use Text_IO;
 package body Something is
   procedure Some_Procedure is
    begin
       Put_Line ("Hello world");
   end Some_Procedure;
end Something;

then Some_Procedure mangled will be something like: something__some_procedure
(with e.g. 2 underscores between the package name and the procedure name).
Comment 9 Paul Floyd 2024-10-10 20:09:36 UTC
I'm not seeing ada_demangle replace double underscores with a dot, even though the comments say it should. I need to look at it a bit more.
Comment 10 Mark Wielaard 2024-10-11 14:53:27 UTC
So one simple question might be whether we actually want to demangle Ada?
It looks like the mangling is so simple (. -> __, add _ada_ for "top-level" functions)  that people might just be used to the mangled representation.
If we now start to demangle it wouldn't that also change suppressions?
Comment 11 laguest@archeia.com 2024-10-11 15:18:19 UTC
Names can get quite long, it would actually shorten them and make them easier to read, especially if they are Capitalised as they should be.
Comment 12 Paul Floyd 2024-10-12 05:04:16 UTC
The libiberty code does handle quite a few cases. I have no idea how many will be reachable from Valgrind errors or even when debugging Valgrind.

If there are any deficiencies in the demangling then I would prefer to report them to libiberty first.
Comment 13 Paul Floyd 2024-10-12 11:33:31 UTC
Created attachment 174732 [details]
Patch to activate caling the ada demangler

I'm triggering ada demangling based on the presence of __gnat_ada_main_program_name rather than any prefix.

One test gives me the following

Without the change
==68881==    by 0x4975937: fwrite (in /lib/libc.so.7)
==68881==    by 0x416B20: system__file_io__write_buf (s-fileio.adb:1311)
==68881==    by 0x411D83: ada__text_io__put_line (a-textio.adb:1433)
==68881==    by 0x404012: bad_print__uninit_print (bad_print.adp:12)
==68881==    by 0x404022: _ada_main (main.adb:6)
==68881==    by 0x403F8A: main (b~main.adb:260)

With the change
==68832==    by 0x4975937: fwrite (in /lib/libc.so.7)
==68832==    by 0x416B20: system.file_io.write_buf (s-fileio.adb:1311)
==68832==    by 0x411D83: ada.text_io.put_line (a-textio.adb:1433)
==68832==    by 0x404012: bad_print.uninit_print (bad_print.adp:12)
==68832==    by 0x404022: main (main.adb:6)
==68832==    by 0x403F8A: main (b~main.adb:260)
Comment 14 Philippe Waroquiers 2024-10-12 14:17:43 UTC
(In reply to Mark Wielaard from comment #10)
> So one simple question might be whether we actually want to demangle Ada?
> It looks like the mangling is so simple (. -> __, add _ada_ for "top-level"
> functions)  that people might just be used to the mangled representation.
> If we now start to demangle it wouldn't that also change suppressions?

Effectively, typically there is no real difficulty to read the mangled Ada names. E.g. I am regularly using kcachegrind to look at callgrind perf results,
and kcachegrind shows mangled names that can be interpreted without much difficulties.

For what concerns suppressions: our suppressions are using mangled names. The user manual documents explicitly for c++ that mangled names have to be used for suppressions. Suppression matching is not calling the demangler (see m_errormgr.c foComplete).
This is calling VG_(get_fname_no_cxx_demangle). I will imagine that then no demangler at all will be called.

So, if ever the Ada demangler is made working, I do not think that this will impact suppression matching (and it would be better that this
does not impact supp matching as otherwise that will be a significant backward incompatible change).
Comment 15 Mark Wielaard 2024-10-12 15:19:11 UTC
(In reply to Paul Floyd from comment #13)
> Created attachment 174732 [details]
> Patch to activate caling the ada demangler
> 
> I'm triggering ada demangling based on the presence of
> __gnat_ada_main_program_name rather than any prefix.

O, clever. I like that. Patch looks good.

(In reply to Philippe Waroquiers from comment #14)
> For what concerns suppressions: our suppressions are using mangled names.
> The user manual documents explicitly for c++ that mangled names have to be
> used for suppressions. Suppression matching is not calling the demangler
> (see m_errormgr.c foComplete).
> This is calling VG_(get_fname_no_cxx_demangle). I will imagine that then no
> demangler at all will be called.

O, you are right. I forgot. Thanks.
 
> So, if ever the Ada demangler is made working, I do not think that this will
> impact suppression matching

Lets try out Paul's patch.
Comment 16 Paul Floyd 2024-10-13 11:39:37 UTC
If _GLOBAL_ comes up as an issue I'll open a new item.

commit 27fe47bd8a6d803966fee718fd4f8c047780cf51 (HEAD -> master, origin/users/paulf/try-bug445235, origin/master, origin/HEAD, bug445235)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Sat Oct 12 16:03:22 2024 +0200

    Bug 445235 - Java/Ada/D demangling is probably broken
    
    Ada demangling is now functional.
    
    There maybe some missing functionality like the _GLOBAL_
    mentioned in the bugzilla item.