306171 – Boost.Context appears to cause Valgrind to crash

Bug 306171 - Boost.Context appears to cause Valgrind to crash

Summary: Boost.Context appears to cause Valgrind to crash

Status:	REPORTED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (other bugs)
Version First Reported In:	3.8.0
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-02 18:41 UTC by Jeremiah Willcock
Modified:	2014-06-05 15:02 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Jeremiah Willcock 2012-09-02 18:41:54 UTC

When running programs that use Boost.Context (I am using the trunk version of Boost, and the Boost.Context test program in libs/context/test/test_context.cpp triggers the issue) on Linux on an x86-64 system, Valgrind receives an internal error and crashes with a segmentation fault.  The exact error message is in "actual results" below.

Reproducible: Always

Steps to Reproduce:
1. Download the SVN version of Boost (release 1.51 may fail as well); the easiest command for that is "svn checkout https://svn.boost.org/svn/boost/trunk".
2. At the top level of the Boost source tree:
  a. Run "./bootstrap.sh; ./b2 libs/context/test".
  b. Run "find ./bin.v2/libs/context/test/ -perm 0755 -type f".
  c. Run Valgrind (the tools "none" and "callgrind" do not trigger the error, but "memcheck", "drd", and "helgrind" do) on the executable file printed by the find command.
Actual Results:  
The output from memcheck is:

==13890== Memcheck, a memory error detector
==13890== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==13890== Using Valgrind-3.8.0 and LibVEX; rerun with -h for copyright info
==13890== Command: /u/jewillco/boost-svn/bin.v2/libs/context/test/test_context.test/gcc-4.7.1/debug/link-static/test_context
==13890==
Running 7 test cases...
==13890== Warning: client switching stacks?  SP change: 0x7feffded8 --> 0x4f19ff8
==13890==          to suppress, use: --max-stackframe=34260008672 or greater
==13890== Warning: client switching stacks?  SP change: 0x4f19fd8 --> 0x7feffdee0
==13890==          to suppress, use: --max-stackframe=34260008712 or greater
==13890== Warning: client switching stacks?  SP change: 0x7feffde58 --> 0x4f1cff8
==13890==          to suppress, use: --max-stackframe=34259996256 or greater
==13890==          further instances of this message will not be shown.
--13890-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--13890-- si_code=1;  Faulting address: 0x4F29000;  sp: 0x402abeca0

valgrind: the 'impossible' happened:
   Killed by fatal signal
==13890==    at 0x3806A249: vgPlain_get_StackTrace_wrk (m_stacktrace.c:361)
==13890==    by 0x3806A386: vgPlain_get_StackTrace (m_stacktrace.c:1086)
==13890==    by 0x38053CF5: record_ExeContext_wrk (m_execontext.c:314)
==13890==    by 0x380280E7: vgMemCheck_new_block (mc_malloc_wrappers.c:280)
==13890==    by 0x380282FA: vgMemCheck_malloc (mc_malloc_wrappers.c:301)
==13890==    by 0x3809C470: vgPlain_scheduler (scheduler.c:1665)
==13890==    by 0x380AB619: run_a_thread_NORETURN (syswrap-linux.c:103)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
Segmentation fault (core dumped)


Expected Results:  
Valgrind not crashing.  Note that the warnings about stack switching are correct (the program actually does create and switch to new stacks).

I am running RHEL version (from /etc/redhat-release):

Red Hat Enterprise Linux Workstation release 6.3 (Santiago)

with a manually compiled version of Valgrind 3.8.0, with both Valgrind and the test programs built with a manually compiled version of GCC 4.7.1.  The CPU is a quad-core, 8-thread Intel Xeon X5570 running in 64-bit mode.  The kernel version is 2.6.32-279.2.1.el6.x86_64.

Comment 1 Jeremiah Willcock 2012-09-02 18:46:13 UTC

Note that Boost.Context creates its own guard pages below the stacks that it allocates.  Also, I tried adding a call to VALGRIND_STACK_REGISTER on the Boost.Context-allocated stacks in a (different) test program, and that does not appear to work around the problem.

Comment 2 simonvalgrind 2012-11-15 03:45:11 UTC

We're seeing the same thing at my company. We're using Boost.Context as part of a high performance server, so this means we can't use Valgrind on our product.

Any help or insight anyone can give would be extremely helpful!

Comment 3 Julian Seward 2013-03-01 08:54:22 UTC

Yes, is probably a legit bug that we should fix.  Persuading Boost to not do
stack switching would be a short term workaround.

Comment 4 Jeremiah Willcock 2013-03-01 18:44:39 UTC

The goal of Boost.Context is to do the stack switching (to get user-level threads and similar constructs), so there is probably no way to avoid it.

Comment 5 Jeremiah Willcock 2013-03-01 18:46:45 UTC

The stack handling (allocating and deallocating stacks, plus creating guard pages) has been moved into Boost.Coroutine, while context switching between stacks is still in Boost.Context.

Comment 6 simonvalgrind 2014-03-31 22:54:23 UTC

For anyone else who comes across this issue:
There is code to help valgrind understand co-routines. We've had some luck with VALGRIND_STACK_REGISTER() and VALGRIND_STACK_DEREGISTER().
See: 
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01579.html
https://github.com/acunu/libstutter/blob/master/stutter/coroutine.cpp

Comment 7 Morten Bendiksen 2014-06-05 15:02:53 UTC

By making a stack allocator wrapper for boost couroutine, it worked for me. 

boost::coroutines::coroutine<void>::pull_type(
            [pro, this] (boost::coroutines::coroutine<void>::push_type& new_push_ptr) {}, boost::coroutines::attributes(), valgrind_stack_allocator());

#ifdef HAVE_VALGRIND_H
#include <unordered_map>
#include <valgrind/valgrind.h>
#endif

// Wraps boost::coroutine::stack_allocator, and if Valgrind is installed
// will register stacks, so that Valgrind is not confused.
class valgrind_stack_allocator {
    boost::coroutines::stack_allocator allocator;
#ifdef HAVE_VALGRIND_H
    std::unordered_map<void*, unsigned> stack_ids;
#endif
    
public:
    static bool is_stack_unbound() {
        return boost::coroutines::stack_allocator::is_stack_unbound();
    }

    static std::size_t maximum_stacksize() {
        return boost::coroutines::stack_allocator::maximum_stacksize();
    }

    static std::size_t default_stacksize() {
        return boost::coroutines::stack_allocator::default_stacksize();
    }
    
    static std::size_t minimum_stacksize() {
        return boost::coroutines::stack_allocator::minimum_stacksize();
    }
    
    void allocate( boost::coroutines::stack_context & sc, std::size_t size) {
        allocator.allocate(sc, size);
#ifdef HAVE_VALGRIND_H
        auto res = stack_ids.insert(
            std::make_pair(
                sc.sp,
                VALGRIND_STACK_REGISTER(sc.sp, (((char*)sc.sp) - sc.size))));
        (void)res;
        assert(res.second);
#endif
    }

    void deallocate( boost::coroutines::stack_context & sc) {
#ifdef HAVE_VALGRIND_H
        auto id = stack_ids.find(sc.sp);
        assert(id != stack_ids.end());
        VALGRIND_STACK_DEREGISTER(id->second);
        stack_ids.erase(id);
#endif
        allocator.deallocate(sc);
    }
};