Bug 259977 - Valgrind segfaults doing __builtin_longjmp
Summary: Valgrind segfaults doing __builtin_longjmp
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.6 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 214223 (view as bug list)
Depends on:
Blocks: 267630
  Show dependency treegraph
 
Reported: 2010-12-15 18:15 UTC by Maynard Johnson
Modified: 2011-05-16 12:55 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch to fix this bug (16.04 KB, patch)
2010-12-15 18:15 UTC, Maynard Johnson
Details
version 2 patch (16.13 KB, patch)
2011-03-29 01:41 UTC, Maynard Johnson
Details
proof of concept that homegrown minimal-setjmp/longjmp is doable (5.58 KB, patch)
2011-04-08 00:13 UTC, Julian Seward
Details
modified proof of concept that homegrown minimal-setjmp/longjmp is doable (4.37 KB, patch)
2011-04-09 03:43 UTC, Peter Bergner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Maynard Johnson 2010-12-15 18:15:33 UTC
Created attachment 54591 [details]
Patch to fix this bug

Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under certain conditions:
   - Running on a processor where one of the capability checks will fail
     (e.g., IBM POWER5, checking for Altivec)
            AND
   - Valgrind was built with gcc 4.4.4 or later

As implied above, I've seen this failure occur on IBM Power processor families.  When valgrind starts up, the machine_get_hwcaps function is used to ascertain hardware capabilities and support for various instructions.  The mechanism to determine whether or not a particular instruction is supported by the current processor is to set up a signal handler, call __builtin_setjmp, then try to execute the instruction; if it's an illegal instruction, a SIGILL is generated and the signal handler does a __builtin_longjmp back to a location where a capabilities variable is set to "False" to indicate the particular capability is not present.

The root problem is that the newer gcc was not generating the appropriate code to long jump back to the correct location.  Instead, the long jump target was indeterminate, which in this case, resulted in a segmentation fault.

Reporting the problem to my resident gcc expert gave no help other than telling us that:
1) __builtin_[set|long]jmp are for internal gcc use only; and 
2) longjmp from a signal handler is undefined (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1 The signal function", point #5).  Questioning whether this limitation applies to __builtin_longjmp, as well as to longjmp results in being told "what part of internal gcc use do you not understand?"

An alternate solution for ascertaining hardware capabilities on IBM Power processors is to access the aux vector, which is passed to a program after the env array (with one NULL in between).  The attached patch uses this alternate solution.

Note that there are other instances of __builtin_[set|long]jmp in valgrind which are not addressed by the patch attached to this bug, as follows:

Usage of __builtin_[set|long]jmp with signal handlers
----------------------------------------------------
   - coregrind/m_machine.c:  For ARM architecture
   - exp-ptrcheck/tests (several users of '#define TTT' which makes use of
     __builtin_setjmp)
   - memcheck/tests (badjmup2.c)
   - memcheck/mc_leakcheck.c

Uses of __builtin_[set|long]jmp without signal handlers
----------------------------------------------------
   - coregrind/m_debuginfo/readdwarf.c
   - coregrind/m_scheduler/scheduler.c

I suspect that the same segfault could occur on ARM when checking h/w capabilities, assuming valgrind is built with gcc 4.4.4 or later and is run on an ARM processor that is old enough to fail one of the capabilities checks.  Someone knowledgeable with the ARM architecture should verify and open a bug as needed.

The other cases need further investigation to determine if correct code is being generated.
Comment 1 Julian Seward 2011-01-11 00:01:27 UTC
Jakub, have you seen any similar failures w/ your valgrind builds
on power/powerpc?
Comment 2 Maynard Johnson 2011-03-08 16:35:01 UTC
I hope this bug can be resolved soon, as the new POWER7 support patch I just submitted (via Valgrind bug # 267630) depends on this fix.  Without it, I'd have to do POWER7 VSX capability checking using the existing technique of the __builtin_[set|long]jmp.  And then, as described above, we would encounter the segfault on POWER6 and older processors when valgrind is built on gcc 4.4.4 or later.  Thanks.
Comment 3 Maynard Johnson 2011-03-29 01:41:07 UTC
Created attachment 58413 [details]
version 2 patch

I respun the patch against today's svn.
Comment 4 Julian Seward 2011-04-01 19:04:55 UTC
(In reply to comment #0)
Ehm, if I can ask some more questions about this ..

> Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under
> certain conditions:
>    - Running on a processor where one of the capability checks will fail
>      (e.g., IBM POWER5, checking for Altivec)
>             AND
>    - Valgrind was built with gcc 4.4.4 or later

So this is a regression introduced in gcc 4.4.4 but not present in
(eg) 4.4.3 ?  Do you have further analysis of the failure, and/or a
GCC bugzilla entry for it?

> Reporting the problem to my resident gcc expert gave no help other than telling
> us that:
> 1) __builtin_[set|long]jmp are for internal gcc use only; and 
> 2) longjmp from a signal handler is undefined (see
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1
> The signal function", point #5).  Questioning whether this limitation applies
> to __builtin_longjmp, as well as to longjmp results in being told "what part of
> internal gcc use do you not understand?"

We've been using __builtin_longjmp since, well, the beginning of the
project, without difficulty.

Does it fail both for 64-bit processes, 32-bit processes, or both?
I'm just wondering if this is somehow TOC-pointer related.
Comment 5 Maynard Johnson 2011-04-01 22:44:53 UTC
(In reply to comment #4)
> (In reply to comment #0)
> Ehm, if I can ask some more questions about this ..
> 
> > Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under
> > certain conditions:
> >    - Running on a processor where one of the capability checks will fail
> >      (e.g., IBM POWER5, checking for Altivec)
> >             AND
> >    - Valgrind was built with gcc 4.4.4 or later
> 
> So this is a regression introduced in gcc 4.4.4 but not present in
> (eg) 4.4.3 ?  Do you have further analysis of the failure, and/or a
> GCC bugzilla entry for it?
The gcc guys I talked to here at IBM would not say this is a regression, since there was never any guarantee of behavior for external use.  They were absolutely adamant about it and would not even consider looking into why it no longer worked for the case in point.  :-/
> 
> > Reporting the problem to my resident gcc expert gave no help other than telling
> > us that:
> > 1) __builtin_[set|long]jmp are for internal gcc use only; and 
> > 2) longjmp from a signal handler is undefined (see
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1
> > The signal function", point #5).  Questioning whether this limitation applies
> > to __builtin_longjmp, as well as to longjmp results in being told "what part of
> > internal gcc use do you not understand?"
> 
> We've been using __builtin_longjmp since, well, the beginning of the
> project, without difficulty.
> 
> Does it fail both for 64-bit processes, 32-bit processes, or both?
> I'm just wondering if this is somehow TOC-pointer related.
It fails for both 32-bit and 64-bit.  I imagine you'd really like to be able to reproduce this yourself.  You could do so by pulling the latest release from the gcc home page (http://gcc.gnu.org/), building it, and then use it to build valgrind.  Then run 'none/tests/ppc64/jm-insns -a' under valgrind to see the segfault.

An alternative to building gcc yourself is to obtain the "IBM Advance Toolchain" rpms, available at: 

     ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at4.0

You'll find both redhat (5 and 6) and sles (10 and 11) directories there.  You would need to install both the advance-toolchain-at4.0-runtime-4.0-1.ppc64.rpm and the advance-toolchain-at4.0-devel-4.0-1.ppc64.rpm.

What is this "Advance Toolchain"?  Here's a blurb from an IBM website:

"The Advance Toolchain is a collection of free-software application development products specifically tailored to make early use of IBM hardware features. It includes GCC (GNU Compiler Collection), GLIBC (GNU C Libraries), GNU binary utilities, GDB ( GNU debugger), and the performance analysis tools (QProfile and Valgrind) in a self contained Toochain. The Advance Toolchain duplicates the function of packages already delivered by Linux distributors but provides additional features for the POWER6 and now POWER7 family of processors as indicated in the Advance Toolchain Release Notes 'Features' section."

-Maynard
Comment 6 Maynard Johnson 2011-04-02 00:06:20 UTC
[snip]
> An alternative to building gcc yourself is to obtain the "IBM Advance
> Toolchain" rpms, available at: 
> 
>      ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at4.0
> 
> You'll find both redhat (5 and 6) and sles (10 and 11) directories there.  You
> would need to install both the advance-toolchain-at4.0-runtime-4.0-1.ppc64.rpm
> and the advance-toolchain-at4.0-devel-4.0-1.ppc64.rpm.
If you do install the AT, all you have to do is 'export PATH=/opt/at4.0/bin:$PATH' and then the AT's compiler (and other associated tooling) will be used to do your builds.  The AT will link the binaries it builds to the AT runtime libraries, which is why you need to install both the runtime and the devel packages.

-Maynard

[snip]
Comment 7 Julian Seward 2011-04-04 20:58:47 UTC
I can reproduce this, using Debian 6.0.1 on a PPC750.  This ships
with gcc 4.4.5.
Comment 8 Julian Seward 2011-04-05 16:53:28 UTC
I see the attraction of the patch.  Problem is that losing setjmp/longjmp
more generally (as per your comment #0) is going to be problematic, and
so I'd like to see if we can somehow retain that.  If we could link libc
that would be simple, but we can't, hence the use of the gcc builtin
versions.

Am building gcc-4.6.0 on ppc32 to see if the __builtin_longjmp problem
got fixed since 4.4.4.
Comment 9 Maynard Johnson 2011-04-05 18:26:08 UTC
(In reply to comment #8)
> I see the attraction of the patch.  Problem is that losing setjmp/longjmp
> more generally (as per your comment #0) is going to be problematic, and
> so I'd like to see if we can somehow retain that.  If we could link libc
> that would be simple, but we can't, hence the use of the gcc builtin
> versions.
> 
> Am building gcc-4.6.0 on ppc32 to see if the __builtin_longjmp problem
> got fixed since 4.4.4.
Actually, regardless of what you find with gcc 4.6, I would hope this patch can be accepted for ppc[64] capabilities checking.
Comment 10 Julian Seward 2011-04-06 10:01:56 UTC
It also fails with gcc-4.6.0.  Oh well.

I see that there are two different issues here: one is the loss of
usability of __builtin_{set,long}jmp, and the other is enhancement
of capabilities checking for ppc, to support bug 267630.  Addressing
the former problem first, I think it would not be hard to implement
a minimal setjmp/longjmp replacement that just saves and restores
the integer registers -- that's all that's needed.
Comment 11 Julian Seward 2011-04-08 00:13:42 UTC
Created attachment 58696 [details]
proof of concept that homegrown minimal-setjmp/longjmp is doable

Stops the existing SIGILL-based feature testing segfaulting
on ppc32-linux.
Comment 12 Peter Bergner 2011-04-08 17:30:28 UTC
Julian, looking at your patch in Comment #11, I have a couple of comments.

First off, there is no need to save/restore volatile registers.  In fact, I recommend not saving them, so people don't become accustomed to them being saved/restored and becoming dependent on it.  That said, I think you just want to save r1 and r14-r31, plus the lr/cr you're already saving and then restore the same regs plus the r3 return value.

Secondly, I was going to complain that you were not saving/restoring the FP and Altivec non-volatile registers, but I think I've convinced myself that you are correct and we don't need to for this specific instance of catching SIGILLs, since if the system you're on doesn't support FP instructions, then we'd SIGILL much before the call to mysetjmp if there had been any FP instructions executed, so no problem there.  Vice versa, if the system does support FP instructions, then we won't SIGILL on the test instruction and since your mysetjmp function clearly doesn't use any FP instructions, then we're good there too.  Ditto for Altivec.
Comment 13 Peter Bergner 2011-04-09 03:43:05 UTC
Created attachment 58732 [details]
modified proof of concept that homegrown minimal-setjmp/longjmp is doable

Here's a slightly modified proof of concept patch that only saves and restores the non-volatile registers.
Comment 14 Julian Seward 2011-04-12 00:07:00 UTC
Fixed, revs 11687,8,9 and 11690.
Comment 15 Maynard Johnson 2011-04-12 19:08:14 UTC
Julian,
Unfortunately, something broke between the proof of concept and what got committed to svn.  With the proof of concepts (your 32-bit version and my 64-bit version), I got expected results when trying to run a testcase having Altivec instructions under Valgrind on a POWER5 system (i.e., no Altivec):

------------
mpj@begna:~/VG-setjmp/vg-setjmp> ./vg-in-place none/tests/ppc32/jm-insns -a
==3924== Memcheck, a memory error detector
==3924== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==3924== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
==3924== Command: none/tests/ppc32/jm-insns -a
==3924==
disInstr(ppc): declined to decode an AltiVec insn.
disInstr(ppc): unhandled instruction: 0x7C0048CE
                 primary 31(0x1F), secondary 206(0xCE)
==3924== valgrind: Unrecognised instruction at address 0x10004254.
==3924==    at 0x10004254: main (jm-insns.c:4422)
          . . . 
          . . .
----------------

But now, running the same testcase on the POWER5 system, I get the following error:

-----------------
mpj@begna:~/vg-upstream/valgrind> ./vg-in-place --tool=none none/tests/ppc32/jm-insns -a
==3958== Nulgrind, the minimal Valgrind tool
==3958== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote.
==3958== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
==3958== Command: none/tests/ppc32/jm-insns -a
==3958==
==3958==
==3958== Process terminating with default action of signal 4 (SIGILL)
==3958==  Illegal opcode at address 0x568D64C
==3958==    at 0x100074B8: main (jm-insns.c:4422)
==3958==
./vg-in-place: line 31:  3958 Illegal instruction     VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"
--------------------------

This error implies that valgrind actually decoded the insn and then we die when the native insn is attempted in the dispatcher. I'll take a look and see if I can find the problem.

NOTE: Running this testcase using upstream valgrind on a system that has Altivec works as expected.
Comment 16 Julian Seward 2011-04-12 19:30:07 UTC
Hmm, that's odd.  I verified that r11690 does correct feature checking
on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL
caught and correctly handled.

I also tried the 64-bit stuff, but I only have a PPC970 to test on, so
none of the tested instructions actually faulted.  So I introduced a
fake check of a ".long 0" to check that the SIGILL handler caught it,
and recovered correctly, and that worked OK too.

You rebuilt from distclean, right?
Comment 17 Maynard Johnson 2011-04-12 20:58:53 UTC
(In reply to comment #16)
> Hmm, that's odd.  I verified that r11690 does correct feature checking
> on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL
> caught and correctly handled.
> 
> I also tried the 64-bit stuff, but I only have a PPC970 to test on, so
> none of the tested instructions actually faulted.  So I introduced a
> fake check of a ".long 0" to check that the SIGILL handler caught it,
> and recovered correctly, and that worked OK too.
> 
> You rebuilt from distclean, right?
Sorry, false alarm caused by a bug I introduced in version 2 of this patch.  The changes I made to none/tests/ppc{32|64}/Makefile.am were not correct for all environments.  The environment I was testing on today was a POWER5 with a new toolchain that recognizes POWER7 instructions.  The Makefile was incorrectly adding '-mvsx' into the compilation of jm-insns.c (which should be Altivec testing, not VSX testing), resulting in some VSX instructions being automatically used by the compiler for loading vectors.  I coded a fix for that, and verified your upstream changes with VG_MINIMAL_SETJMP work correctly.

I will open a new bug report to fix the regression I introduced in version 2.
Comment 18 Maynard Johnson 2011-04-12 21:01:47 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Hmm, that's odd.  I verified that r11690 does correct feature checking
> > on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL
> > caught and correctly handled.
> > 
> > I also tried the 64-bit stuff, but I only have a PPC970 to test on, so
> > none of the tested instructions actually faulted.  So I introduced a
> > fake check of a ".long 0" to check that the SIGILL handler caught it,
> > and recovered correctly, and that worked OK too.
> > 
> > You rebuilt from distclean, right?
> Sorry, false alarm caused by a bug I introduced in version 2 of this patch. 
I meant to say "version 2 of the patch attached to bug #267630".
Comment 19 Julian Seward 2011-04-13 01:04:12 UTC
Closing, since the segfaulting is now fixed, afaics.
Comment 20 Julian Seward 2011-05-16 12:55:49 UTC
*** Bug 214223 has been marked as a duplicate of this bug. ***