Bug 496370 - Illumos: signal handling is broken
Summary: Illumos: signal handling is broken
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.24 GIT
Platform: Other Unspecified
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-17 11:03 UTC by Paul Floyd
Modified: 2025-02-14 19:49 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Floyd 2024-11-17 11:03:15 UTC
I think that this also applies to unpatched Solaris 11.4

Example:

paulf@openindiana:~/valgrind$ ./vg-in-place memcheck/tests/signal2
==8428== Memcheck, a memory error detector
==8428== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==8428== Using Valgrind-3.25.0.GIT and LibVEX; rerun with -h for copyright info
==8428== Command: memcheck/tests/signal2
==8428== 
installing sig handler
doing bad thing
==8428== Invalid write of size 4
==8428==    at 0x401370: main (signal2.c:17)
==8428==  Address 0xfffff1000 is not stack'd, malloc'd or (recently) free'd
==8428== 
Segmentation Fault (core dumped)

The invalid write is expected, not the segfault.

I've had a quick look at the sigframe code. There have been changes to the Illumos ucontext. However Valgrind on Illumos and Solaris includes the system headers and uses 'ucontext' from the system headers.

#define vki_ucontext ucontext

In gdb it seems that the problem is with VG_(sigframe_create) and VG_(save_context)

     538    /* Sigmask */                                                                                                                                                                                        >   539    uc->uc_sigmask = tst->sig_mask;   

This seems to be causing a segfault, but I can't see why just yet.
Comment 1 Paul Floyd 2024-11-17 18:44:24 UTC
memcheck/tests/signal 2 also fails on Solaris 11.3. 

Building it for x86 version works on Illumos and S11.3.
Comment 2 Paul Floyd 2024-12-03 07:47:27 UTC
I'm fairly sure that this is a regression. On S11.3 I have a build that I did in Dec 2023 that seems OK for several of the test cases with signals.

Need to give git bisect a go.
Comment 3 Paul Floyd 2024-12-04 07:52:00 UTC
It looks like this is compiler related.

Back on S11.3 with an old GCC (4.8.2) I get

== 860 tests, 18 stderr failures, 1 stdout failure, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/amd64-solaris/context_fpu (stderr)
memcheck/tests/solaris/lsframe2          (stderr)
memcheck/tests/vcpu_fnfns                (stdout)
helgrind/tests/bar_bad                   (stderr)
helgrind/tests/bar_trivial               (stderr)
helgrind/tests/bug322621                 (stderr)
helgrind/tests/free_is_write             (stderr)
helgrind/tests/pth_barrier1              (stderr)
helgrind/tests/pth_barrier2              (stderr)
helgrind/tests/pth_barrier3              (stderr)
helgrind/tests/pth_mempcpy_false_races   (stderr)
helgrind/tests/tc21_pthonce              (stderr)
drd/tests/pth_cond_destroy_busy          (stderr)
drd/tests/std_mutex                      (stderr)
none/tests/fdleak_dup2                   (stderr)
none/tests/fdleak_dup2_xml               (stderr)
none/tests/fdleak_pipe_xml               (stderr)
none/tests/fdleak_socketpair_xml         (stderr)
none/tests/file_dclose                   (stderr)

and specifically memcheck/tests/signal2 is OK

GCC 4.8.2 is too old to build Valgrind - there are some opcodes in asm that it doesn't know and it can't compile any C++ from C++17 onwards (and I'm not going to bother adding any configure checks for them).

So I've been using a more recent GCC (12.0.1 20220425). That doesn't have build issues but signals are broken.

Illumos is also using a relatively recent GCC (gcc (OpenIndiana 13.3.0-oi-0) 13.3.0).

Valgrind built with the old GCC is OK with testcases built with GCC 12, so it's the build of Valgrind that matters.

This looks like a big rabbit hole.
Comment 4 Paul Floyd 2025-02-14 08:01:50 UTC
I've tried a GCC 12 build of Valgrind with both a GCC 4.8.2 and GCC 12 built versions of signal2 and they both crash. So it's not a change in the compilation of the testcase that's the problem.

Next, in gdb I get the segfault here in syswrap-solaris.c:


│      534    /* Clear uc->vki_uc_signo.  This slot is used by the signal machinery to
│      535       store a signal number. */
│      536    VKI_UC_SIGNO(uc) = 0;
│      537
│      538    /* Sigmask */
│  >   539    uc->uc_sigmask = tst->sig_mask;
│      540    VG_TRACK(post_mem_write, part, tid, (Addr)&uc->uc_sigmask,
│      541             sizeof(uc->uc_sigmask));

If I compile syswrap-solaris.c without -O2 then the signal2 test works. And in fact the problem exists with -O1 and not with -O0.
Comment 5 Paul Floyd 2025-02-14 09:35:48 UTC
I'll try

#pragma GCC push_options
#pragma GCC optimize ("O0")

void VG_(save_context)(ThreadId tid, vki_ucontext_t *uc, CorePart part)
...

#pragma GCC pop_options
Comment 6 Paul Floyd 2025-02-14 19:49:51 UTC
commit ba5e30347fa5691cc9b0c4c620ee41daa135e610
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Fri Feb 14 20:45:02 2025 +0100

    Bug 496370 - Illumos: signal handling is broken
    
    This isn't a great fix, it just turns off optimization for
    a couple of signal frame functions. If ever I have time I'll try
    to find out out which part of -O1 is responsible, and maybe from
    that also exactly what part of the code.