When running programs that use Boost.Context (I am using the trunk version of Boost, and the Boost.Context test program in libs/context/test/test_context.cpp triggers the issue) on Linux on an x86-64 system, Valgrind receives an internal error and crashes with a segmentation fault. The exact error message is in "actual results" below. Reproducible: Always Steps to Reproduce: 1. Download the SVN version of Boost (release 1.51 may fail as well); the easiest command for that is "svn checkout https://svn.boost.org/svn/boost/trunk". 2. At the top level of the Boost source tree: a. Run "./bootstrap.sh; ./b2 libs/context/test". b. Run "find ./bin.v2/libs/context/test/ -perm 0755 -type f". c. Run Valgrind (the tools "none" and "callgrind" do not trigger the error, but "memcheck", "drd", and "helgrind" do) on the executable file printed by the find command. Actual Results: The output from memcheck is: ==13890== Memcheck, a memory error detector ==13890== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==13890== Using Valgrind-3.8.0 and LibVEX; rerun with -h for copyright info ==13890== Command: /u/jewillco/boost-svn/bin.v2/libs/context/test/test_context.test/gcc-4.7.1/debug/link-static/test_context ==13890== Running 7 test cases... ==13890== Warning: client switching stacks? SP change: 0x7feffded8 --> 0x4f19ff8 ==13890== to suppress, use: --max-stackframe=34260008672 or greater ==13890== Warning: client switching stacks? SP change: 0x4f19fd8 --> 0x7feffdee0 ==13890== to suppress, use: --max-stackframe=34260008712 or greater ==13890== Warning: client switching stacks? SP change: 0x7feffde58 --> 0x4f1cff8 ==13890== to suppress, use: --max-stackframe=34259996256 or greater ==13890== further instances of this message will not be shown. --13890-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --13890-- si_code=1; Faulting address: 0x4F29000; sp: 0x402abeca0 valgrind: the 'impossible' happened: Killed by fatal signal ==13890== at 0x3806A249: vgPlain_get_StackTrace_wrk (m_stacktrace.c:361) ==13890== by 0x3806A386: vgPlain_get_StackTrace (m_stacktrace.c:1086) ==13890== by 0x38053CF5: record_ExeContext_wrk (m_execontext.c:314) ==13890== by 0x380280E7: vgMemCheck_new_block (mc_malloc_wrappers.c:280) ==13890== by 0x380282FA: vgMemCheck_malloc (mc_malloc_wrappers.c:301) ==13890== by 0x3809C470: vgPlain_scheduler (scheduler.c:1665) ==13890== by 0x380AB619: run_a_thread_NORETURN (syswrap-linux.c:103) sched status: running_tid=1 Thread 1: status = VgTs_Runnable Segmentation fault (core dumped) Expected Results: Valgrind not crashing. Note that the warnings about stack switching are correct (the program actually does create and switch to new stacks). I am running RHEL version (from /etc/redhat-release): Red Hat Enterprise Linux Workstation release 6.3 (Santiago) with a manually compiled version of Valgrind 3.8.0, with both Valgrind and the test programs built with a manually compiled version of GCC 4.7.1. The CPU is a quad-core, 8-thread Intel Xeon X5570 running in 64-bit mode. The kernel version is 2.6.32-279.2.1.el6.x86_64.
Note that Boost.Context creates its own guard pages below the stacks that it allocates. Also, I tried adding a call to VALGRIND_STACK_REGISTER on the Boost.Context-allocated stacks in a (different) test program, and that does not appear to work around the problem.
We're seeing the same thing at my company. We're using Boost.Context as part of a high performance server, so this means we can't use Valgrind on our product. Any help or insight anyone can give would be extremely helpful!
Yes, is probably a legit bug that we should fix. Persuading Boost to not do stack switching would be a short term workaround.
The goal of Boost.Context is to do the stack switching (to get user-level threads and similar constructs), so there is probably no way to avoid it.
The stack handling (allocating and deallocating stacks, plus creating guard pages) has been moved into Boost.Coroutine, while context switching between stacks is still in Boost.Context.
For anyone else who comes across this issue: There is code to help valgrind understand co-routines. We've had some luck with VALGRIND_STACK_REGISTER() and VALGRIND_STACK_DEREGISTER(). See: https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01579.html https://github.com/acunu/libstutter/blob/master/stutter/coroutine.cpp
By making a stack allocator wrapper for boost couroutine, it worked for me. boost::coroutines::coroutine<void>::pull_type( [pro, this] (boost::coroutines::coroutine<void>::push_type& new_push_ptr) {}, boost::coroutines::attributes(), valgrind_stack_allocator()); #ifdef HAVE_VALGRIND_H #include <unordered_map> #include <valgrind/valgrind.h> #endif // Wraps boost::coroutine::stack_allocator, and if Valgrind is installed // will register stacks, so that Valgrind is not confused. class valgrind_stack_allocator { boost::coroutines::stack_allocator allocator; #ifdef HAVE_VALGRIND_H std::unordered_map<void*, unsigned> stack_ids; #endif public: static bool is_stack_unbound() { return boost::coroutines::stack_allocator::is_stack_unbound(); } static std::size_t maximum_stacksize() { return boost::coroutines::stack_allocator::maximum_stacksize(); } static std::size_t default_stacksize() { return boost::coroutines::stack_allocator::default_stacksize(); } static std::size_t minimum_stacksize() { return boost::coroutines::stack_allocator::minimum_stacksize(); } void allocate( boost::coroutines::stack_context & sc, std::size_t size) { allocator.allocate(sc, size); #ifdef HAVE_VALGRIND_H auto res = stack_ids.insert( std::make_pair( sc.sp, VALGRIND_STACK_REGISTER(sc.sp, (((char*)sc.sp) - sc.size)))); (void)res; assert(res.second); #endif } void deallocate( boost::coroutines::stack_context & sc) { #ifdef HAVE_VALGRIND_H auto id = stack_ids.find(sc.sp); assert(id != stack_ids.end()); VALGRIND_STACK_DEREGISTER(id->second); stack_ids.erase(id); #endif allocator.deallocate(sc); } };