Bug 443605 - Don't call final_tidyup (__libc_freeres) on FatalSignal
Summary: Don't call final_tidyup (__libc_freeres) on FatalSignal
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-11 16:37 UTC by Mark Wielaard
Modified: 2021-10-12 20:59 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2021-10-11 16:37:26 UTC
When a program gets a fatal signal (one it doesn't handle) valgrind terminates the program. Before termination it will try to call final_tidyup which tries to run __libc_freeres and __gnu_cxx::__freeres to get rid of some memory glibc or libstdc++ don't normally release.

But when the program got the fatal signal in a critical section inside glibc it might leave the datastructures in a bad state and cause __libc_freeres to crash.  This makes valgrind itself crash just before producing its own error summary, making the valgrind run unusable.

A reproducer can found at https://bugzilla.redhat.com/show_bug.cgi?id=1952836 and https://bugzilla.redhat.com/show_bug.cgi?id=1225994#c7

This reproducer is really a worse case scenario with multiple threads racing to get into the critical section that when interrupted will make __libc_freeres unable to cleanup. But it seems a good policy in general. If a program is terminated by a fatal signal instead of normal termination, it seems not having some of the glibc/libstdc++ resource cleaned up is an expected thing.

The following patch implements this:

diff --git a/coregrind/m_main.c b/coregrind/m_main.c
index 56f9c6cbf..70b6c0549 100644
--- a/coregrind/m_main.c
+++ b/coregrind/m_main.c
@@ -2168,6 +2168,7 @@ void shutdown_actions_NORETURN( ThreadId tid,
             || tids_schedretcode == VgSrc_ExitProcess
              || tids_schedretcode == VgSrc_FatalSig );
 
+   /* Try to do final tidyup on "normal" exit, not on FatalSig.  */
    if (tids_schedretcode == VgSrc_ExitThread) {
 
       // We are the last surviving thread.  Right?
@@ -2185,7 +2186,7 @@ void shutdown_actions_NORETURN( ThreadId tid,
       vg_assert(VG_(is_running_thread)(tid));
       vg_assert(VG_(count_living_threads)() == 1);
 
-   } else {
+   } else if (tids_schedretcode == VgSrc_ExitProcess) {
 
       // We may not be the last surviving thread.  However, we
       // want to shut down the entire process.  We hold the lock
Comment 1 Mark Wielaard 2021-10-12 20:59:56 UTC
commit cf9ebf8313952caed53394498fe849251f477c97
Author: Mark Wielaard <mark@klomp.org>
Date:   Tue Oct 12 22:47:57 2021 +0200

    coregrind: Don't call final_tidyup (__libc_freeres) on FatalSignal
    
    When a program gets a fatal signal (one it doesn't handle) valgrind
    terminates the program. Before termination it will try to call
    final_tidyup which tries to run __libc_freeres and
    __gnu_cxx::__freeres to get rid of some memory glibc or libstdc++
    don't normally release.
    
    But when the program got the fatal signal in a critical section inside
    glibc it might leave the datastructures in a bad state and cause
    __libc_freeres to crash.  This makes valgrind itself crash just before
    producing its own error summary, making the valgrind run unusable.
    
    A reproducer can found at
    https://bugzilla.redhat.com/show_bug.cgi?id=1952836 and
    https://bugzilla.redhat.com/show_bug.cgi?id=1225994#c7
    
    This reproducer is really a worse case scenario with multiple threads
    racing to get into the critical section that when interrupted will
    make __libc_freeres unable to cleanup. But it seems a good policy in
    general. If a program is terminated by a fatal signal instead of
    normal termination, it seems not having some of the glibc/libstdc++
    resource cleaned up is an expected thing.
    
    https://bugs.kde.org/show_bug.cgi?id=443605