Created attachment 54591 [details] Patch to fix this bug Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under certain conditions: - Running on a processor where one of the capability checks will fail (e.g., IBM POWER5, checking for Altivec) AND - Valgrind was built with gcc 4.4.4 or later As implied above, I've seen this failure occur on IBM Power processor families. When valgrind starts up, the machine_get_hwcaps function is used to ascertain hardware capabilities and support for various instructions. The mechanism to determine whether or not a particular instruction is supported by the current processor is to set up a signal handler, call __builtin_setjmp, then try to execute the instruction; if it's an illegal instruction, a SIGILL is generated and the signal handler does a __builtin_longjmp back to a location where a capabilities variable is set to "False" to indicate the particular capability is not present. The root problem is that the newer gcc was not generating the appropriate code to long jump back to the correct location. Instead, the long jump target was indeterminate, which in this case, resulted in a segmentation fault. Reporting the problem to my resident gcc expert gave no help other than telling us that: 1) __builtin_[set|long]jmp are for internal gcc use only; and 2) longjmp from a signal handler is undefined (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1 The signal function", point #5). Questioning whether this limitation applies to __builtin_longjmp, as well as to longjmp results in being told "what part of internal gcc use do you not understand?" An alternate solution for ascertaining hardware capabilities on IBM Power processors is to access the aux vector, which is passed to a program after the env array (with one NULL in between). The attached patch uses this alternate solution. Note that there are other instances of __builtin_[set|long]jmp in valgrind which are not addressed by the patch attached to this bug, as follows: Usage of __builtin_[set|long]jmp with signal handlers ---------------------------------------------------- - coregrind/m_machine.c: For ARM architecture - exp-ptrcheck/tests (several users of '#define TTT' which makes use of __builtin_setjmp) - memcheck/tests (badjmup2.c) - memcheck/mc_leakcheck.c Uses of __builtin_[set|long]jmp without signal handlers ---------------------------------------------------- - coregrind/m_debuginfo/readdwarf.c - coregrind/m_scheduler/scheduler.c I suspect that the same segfault could occur on ARM when checking h/w capabilities, assuming valgrind is built with gcc 4.4.4 or later and is run on an ARM processor that is old enough to fail one of the capabilities checks. Someone knowledgeable with the ARM architecture should verify and open a bug as needed. The other cases need further investigation to determine if correct code is being generated.
Jakub, have you seen any similar failures w/ your valgrind builds on power/powerpc?
I hope this bug can be resolved soon, as the new POWER7 support patch I just submitted (via Valgrind bug # 267630) depends on this fix. Without it, I'd have to do POWER7 VSX capability checking using the existing technique of the __builtin_[set|long]jmp. And then, as described above, we would encounter the segfault on POWER6 and older processors when valgrind is built on gcc 4.4.4 or later. Thanks.
Created attachment 58413 [details] version 2 patch I respun the patch against today's svn.
(In reply to comment #0) Ehm, if I can ask some more questions about this .. > Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under > certain conditions: > - Running on a processor where one of the capability checks will fail > (e.g., IBM POWER5, checking for Altivec) > AND > - Valgrind was built with gcc 4.4.4 or later So this is a regression introduced in gcc 4.4.4 but not present in (eg) 4.4.3 ? Do you have further analysis of the failure, and/or a GCC bugzilla entry for it? > Reporting the problem to my resident gcc expert gave no help other than telling > us that: > 1) __builtin_[set|long]jmp are for internal gcc use only; and > 2) longjmp from a signal handler is undefined (see > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1 > The signal function", point #5). Questioning whether this limitation applies > to __builtin_longjmp, as well as to longjmp results in being told "what part of > internal gcc use do you not understand?" We've been using __builtin_longjmp since, well, the beginning of the project, without difficulty. Does it fail both for 64-bit processes, 32-bit processes, or both? I'm just wondering if this is somehow TOC-pointer related.
(In reply to comment #4) > (In reply to comment #0) > Ehm, if I can ask some more questions about this .. > > > Valgrind will segfault in coregrind/m_machine.c:VG_(machine_get_hwcaps) under > > certain conditions: > > - Running on a processor where one of the capability checks will fail > > (e.g., IBM POWER5, checking for Altivec) > > AND > > - Valgrind was built with gcc 4.4.4 or later > > So this is a regression introduced in gcc 4.4.4 but not present in > (eg) 4.4.3 ? Do you have further analysis of the failure, and/or a > GCC bugzilla entry for it? The gcc guys I talked to here at IBM would not say this is a regression, since there was never any guarantee of behavior for external use. They were absolutely adamant about it and would not even consider looking into why it no longer worked for the case in point. :-/ > > > Reporting the problem to my resident gcc expert gave no help other than telling > > us that: > > 1) __builtin_[set|long]jmp are for internal gcc use only; and > > 2) longjmp from a signal handler is undefined (see > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, section "7.14.1.1 > > The signal function", point #5). Questioning whether this limitation applies > > to __builtin_longjmp, as well as to longjmp results in being told "what part of > > internal gcc use do you not understand?" > > We've been using __builtin_longjmp since, well, the beginning of the > project, without difficulty. > > Does it fail both for 64-bit processes, 32-bit processes, or both? > I'm just wondering if this is somehow TOC-pointer related. It fails for both 32-bit and 64-bit. I imagine you'd really like to be able to reproduce this yourself. You could do so by pulling the latest release from the gcc home page (http://gcc.gnu.org/), building it, and then use it to build valgrind. Then run 'none/tests/ppc64/jm-insns -a' under valgrind to see the segfault. An alternative to building gcc yourself is to obtain the "IBM Advance Toolchain" rpms, available at: ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at4.0 You'll find both redhat (5 and 6) and sles (10 and 11) directories there. You would need to install both the advance-toolchain-at4.0-runtime-4.0-1.ppc64.rpm and the advance-toolchain-at4.0-devel-4.0-1.ppc64.rpm. What is this "Advance Toolchain"? Here's a blurb from an IBM website: "The Advance Toolchain is a collection of free-software application development products specifically tailored to make early use of IBM hardware features. It includes GCC (GNU Compiler Collection), GLIBC (GNU C Libraries), GNU binary utilities, GDB ( GNU debugger), and the performance analysis tools (QProfile and Valgrind) in a self contained Toochain. The Advance Toolchain duplicates the function of packages already delivered by Linux distributors but provides additional features for the POWER6 and now POWER7 family of processors as indicated in the Advance Toolchain Release Notes 'Features' section." -Maynard
[snip] > An alternative to building gcc yourself is to obtain the "IBM Advance > Toolchain" rpms, available at: > > ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at4.0 > > You'll find both redhat (5 and 6) and sles (10 and 11) directories there. You > would need to install both the advance-toolchain-at4.0-runtime-4.0-1.ppc64.rpm > and the advance-toolchain-at4.0-devel-4.0-1.ppc64.rpm. If you do install the AT, all you have to do is 'export PATH=/opt/at4.0/bin:$PATH' and then the AT's compiler (and other associated tooling) will be used to do your builds. The AT will link the binaries it builds to the AT runtime libraries, which is why you need to install both the runtime and the devel packages. -Maynard [snip]
I can reproduce this, using Debian 6.0.1 on a PPC750. This ships with gcc 4.4.5.
I see the attraction of the patch. Problem is that losing setjmp/longjmp more generally (as per your comment #0) is going to be problematic, and so I'd like to see if we can somehow retain that. If we could link libc that would be simple, but we can't, hence the use of the gcc builtin versions. Am building gcc-4.6.0 on ppc32 to see if the __builtin_longjmp problem got fixed since 4.4.4.
(In reply to comment #8) > I see the attraction of the patch. Problem is that losing setjmp/longjmp > more generally (as per your comment #0) is going to be problematic, and > so I'd like to see if we can somehow retain that. If we could link libc > that would be simple, but we can't, hence the use of the gcc builtin > versions. > > Am building gcc-4.6.0 on ppc32 to see if the __builtin_longjmp problem > got fixed since 4.4.4. Actually, regardless of what you find with gcc 4.6, I would hope this patch can be accepted for ppc[64] capabilities checking.
It also fails with gcc-4.6.0. Oh well. I see that there are two different issues here: one is the loss of usability of __builtin_{set,long}jmp, and the other is enhancement of capabilities checking for ppc, to support bug 267630. Addressing the former problem first, I think it would not be hard to implement a minimal setjmp/longjmp replacement that just saves and restores the integer registers -- that's all that's needed.
Created attachment 58696 [details] proof of concept that homegrown minimal-setjmp/longjmp is doable Stops the existing SIGILL-based feature testing segfaulting on ppc32-linux.
Julian, looking at your patch in Comment #11, I have a couple of comments. First off, there is no need to save/restore volatile registers. In fact, I recommend not saving them, so people don't become accustomed to them being saved/restored and becoming dependent on it. That said, I think you just want to save r1 and r14-r31, plus the lr/cr you're already saving and then restore the same regs plus the r3 return value. Secondly, I was going to complain that you were not saving/restoring the FP and Altivec non-volatile registers, but I think I've convinced myself that you are correct and we don't need to for this specific instance of catching SIGILLs, since if the system you're on doesn't support FP instructions, then we'd SIGILL much before the call to mysetjmp if there had been any FP instructions executed, so no problem there. Vice versa, if the system does support FP instructions, then we won't SIGILL on the test instruction and since your mysetjmp function clearly doesn't use any FP instructions, then we're good there too. Ditto for Altivec.
Created attachment 58732 [details] modified proof of concept that homegrown minimal-setjmp/longjmp is doable Here's a slightly modified proof of concept patch that only saves and restores the non-volatile registers.
Fixed, revs 11687,8,9 and 11690.
Julian, Unfortunately, something broke between the proof of concept and what got committed to svn. With the proof of concepts (your 32-bit version and my 64-bit version), I got expected results when trying to run a testcase having Altivec instructions under Valgrind on a POWER5 system (i.e., no Altivec): ------------ mpj@begna:~/VG-setjmp/vg-setjmp> ./vg-in-place none/tests/ppc32/jm-insns -a ==3924== Memcheck, a memory error detector ==3924== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==3924== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info ==3924== Command: none/tests/ppc32/jm-insns -a ==3924== disInstr(ppc): declined to decode an AltiVec insn. disInstr(ppc): unhandled instruction: 0x7C0048CE primary 31(0x1F), secondary 206(0xCE) ==3924== valgrind: Unrecognised instruction at address 0x10004254. ==3924== at 0x10004254: main (jm-insns.c:4422) . . . . . . ---------------- But now, running the same testcase on the POWER5 system, I get the following error: ----------------- mpj@begna:~/vg-upstream/valgrind> ./vg-in-place --tool=none none/tests/ppc32/jm-insns -a ==3958== Nulgrind, the minimal Valgrind tool ==3958== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote. ==3958== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info ==3958== Command: none/tests/ppc32/jm-insns -a ==3958== ==3958== ==3958== Process terminating with default action of signal 4 (SIGILL) ==3958== Illegal opcode at address 0x568D64C ==3958== at 0x100074B8: main (jm-insns.c:4422) ==3958== ./vg-in-place: line 31: 3958 Illegal instruction VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@" -------------------------- This error implies that valgrind actually decoded the insn and then we die when the native insn is attempted in the dispatcher. I'll take a look and see if I can find the problem. NOTE: Running this testcase using upstream valgrind on a system that has Altivec works as expected.
Hmm, that's odd. I verified that r11690 does correct feature checking on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL caught and correctly handled. I also tried the 64-bit stuff, but I only have a PPC970 to test on, so none of the tested instructions actually faulted. So I introduced a fake check of a ".long 0" to check that the SIGILL handler caught it, and recovered correctly, and that worked OK too. You rebuilt from distclean, right?
(In reply to comment #16) > Hmm, that's odd. I verified that r11690 does correct feature checking > on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL > caught and correctly handled. > > I also tried the 64-bit stuff, but I only have a PPC970 to test on, so > none of the tested instructions actually faulted. So I introduced a > fake check of a ".long 0" to check that the SIGILL handler caught it, > and recovered correctly, and that worked OK too. > > You rebuilt from distclean, right? Sorry, false alarm caused by a bug I introduced in version 2 of this patch. The changes I made to none/tests/ppc{32|64}/Makefile.am were not correct for all environments. The environment I was testing on today was a POWER5 with a new toolchain that recognizes POWER7 instructions. The Makefile was incorrectly adding '-mvsx' into the compilation of jm-insns.c (which should be Altivec testing, not VSX testing), resulting in some VSX instructions being automatically used by the compiler for loading vectors. I coded a fix for that, and verified your upstream changes with VG_MINIMAL_SETJMP work correctly. I will open a new bug report to fix the regression I introduced in version 2.
(In reply to comment #17) > (In reply to comment #16) > > Hmm, that's odd. I verified that r11690 does correct feature checking > > on 32-bits, on a PPC750 (no Altivec), so there's at least one SIGILL > > caught and correctly handled. > > > > I also tried the 64-bit stuff, but I only have a PPC970 to test on, so > > none of the tested instructions actually faulted. So I introduced a > > fake check of a ".long 0" to check that the SIGILL handler caught it, > > and recovered correctly, and that worked OK too. > > > > You rebuilt from distclean, right? > Sorry, false alarm caused by a bug I introduced in version 2 of this patch. I meant to say "version 2 of the patch attached to bug #267630".
Closing, since the segfaulting is now fixed, afaics.
*** Bug 214223 has been marked as a duplicate of this bug. ***