Bug 197988 - Crash when demangling very large symbol names.
Summary: Crash when demangling very large symbol names.
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: callgrind (show other bugs)
Version: 3.4.1
Platform: Ubuntu Linux
: NOR crash
Target Milestone: wanted3.6.0
Assignee: Josef Weidendorfer
URL:
Keywords:
: 204572 217863 240488 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-06-26 20:00 UTC by pistmaster
Modified: 2021-05-10 23:41 UTC (History)
11 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Complete run log (59.17 KB, text/plain)
2009-07-14 14:33 UTC, Mishael A Sibiryakov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description pistmaster 2009-06-26 20:00:42 UTC
Valgrind crashes when using callgrind/cachegrind on an exe which contains very long symbol names.

Example program to reproduce:
#include <iostream>
#include <string>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

struct grammar
{
    sregex _tuple;
    sregex _element;
    sregex _identifier;

    grammar()
    {
        _tuple = '(' >> by_ref(_element) >> *(',' >> by_ref(_element)) >> ')';
        _element = by_ref(_identifier) | _tuple;
        _identifier = (_w | '_') >> *(_w | _d | '_');
    }
};

int main()
{
    grammar g;
    if (regex_match(std::string("(a,(b,c))"), g._tuple))
        std::cout << "match" << std::endl;
    else
        std::cout << "unmatched" << std::endl;
    return 0;
}

Results of running valgrind with "valgrind --tool=callgrind -v -d ./test1" on said program:

--32649:1:debuglog DebugLog system started by Stage 1, level 1 logging requested
--32649:1:launcher tool 'callgrind' requested
--32649:1:launcher selected platform 'amd64-linux'
--32649:1:launcher launching /usr/lib/valgrind/amd64-linux/callgrind
--32649:1:debuglog DebugLog system started by Stage 2 (main), level 1 logging requested
--32649:1:main     Welcome to Valgrind version 3.4.1-Debian debug logging
--32649:1:main     Checking current stack is plausible
--32649:1:main     Checking initial stack was noted
--32649:1:main     Starting the address space manager
--32649:1:main     Address space manager is running
--32649:1:main     Starting the dynamic memory manager
--32649:1:mallocfr newSuperblock at 0x402001000 (pszB 4194280) owner VALGRIND/tool
--32649:1:main     Dynamic memory manager is running
--32649:1:main     Initialise m_debuginfo
--32649:1:main     Getting stage1's name
--32649:1:main     Get hardware capabilities ...
--32649:1:main     ... arch = AMD64, hwcaps = amd64-sse2
--32649:1:main     Getting the working directory at startup
--32649:1:main     ... /home/stuff/stuff/src/xpressive
--32649:1:main     Split up command line
--32649:1:main     (early_) Process Valgrind's command line options
--32649:1:main     Create initial image
--32649:1:initimg  Loading client
--32649:1:initimg  Setup client env
--32649:1:initimg  Setup client stack: size will be 8388608
--32649:1:initimg  Setup client data (brk) segment
--32649:1:main     Setup file descriptors
--32649:1:main     Create fake /proc/<pid>/cmdline
--32649:1:main     Initialise the tool part 1 (pre_clo_init)
--32649:1:main     Print help and quit, if requested
--32649:1:main     (main_) Process Valgrind's command line options, setup logging
--32649:1:main     Print the preamble...
==32649== Callgrind, a call-graph generating cache profiler.
==32649== Copyright (C) 2002-2008, and GNU GPL'd, by Josef Weidendorfer et al.
==32649== Using LibVEX rev 1884, a library for dynamic binary translation.
==32649== Copyright (C) 2004-2008, and GNU GPL'd, by Openstuffs LLP.
==32649== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework.
==32649== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==32649== 
--32649-- Command line
--32649--    ./test1
--32649-- Startup, with flags:
--32649--    --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp
--32649--    --tool=callgrind
--32649--    -v
--32649--    -d
--32649-- Contents of /proc/version:
--32649--   Linux version 2.6.28-13-generic (buildd@yellow) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #44-Ubuntu SMP Tue Jun 2 07:55:09 UTC 2009
--32649-- Arch and hwcaps: AMD64, amd64-sse2
--32649-- Page sizes: currently 4096, max supported 4096
--32649-- Valgrind library directory: /usr/lib/valgrind
--32649:1:main     ...finished the preamble
--32649:1:main     Initialise the tool part 2 (post_clo_init)
==32649== For interactive control, run 'callgrind_control -h'.
--32649:1:main     Initialise TT/TC
--32649:1:main     Initialise redirects
--32649:1:mallocfr newSuperblock at 0x40247C000 (pszB 1048552) owner VALGRIND/dinfo
--32649:1:main     Load initial debug info
--32649-- Reading syms from /home/stuff/stuff/src/xpressive/test1 (0x400000)
--32649-- Reading syms from /lib/ld-2.9.so (0x4000000)
--32649-- Reading debug info from /lib/ld-2.9.so ..
--32649-- .. CRC mismatch (computed 2aeecd7e wanted a5503f5d)
--32649-- Reading debug info from /usr/lib/debug/lib/ld-2.9.so ..
--32649:1:mallocfr newSuperblock at 0x40257C000 (pszB 1048552) owner VALGRIND/dinfo
--32649-- Reading syms from /usr/lib/valgrind/amd64-linux/callgrind (0x38000000)
--32649--    object doesn't have a dynamic symbol table
--32649:1:mallocfr newSuperblock at 0x40267C000 (pszB 1048552) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x40277C000 (pszB 1048552) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x40287C000 (pszB 2052072) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x402A71000 (pszB 4100072) owner VALGRIND/dinfo
--32649:1:redir    transfer ownership V -> C of 0x3802c000 .. 0x3802cfff
--32649:1:main     Initialise scheduler (phase 1)
--32649:1:sched    sched_init_phase1
--32649:1:main     Tell tool about initial permissions
--32649:1:main     Initialise scheduler (phase 2)
--32649:1:sched    sched_init_phase2: tid_main=1, cls_end=0x7ff000fff, cls_sz=8388608
--32649:1:main     Finalise initial image
--32649:1:main     Initialise signal management
--32649:1:mallocfr newSuperblock at 0x402E5A000 (pszB 1048552) owner VALGRIND/core
--32649:1:main     
--32649:1:main     
--32649:1:aspacem  <<< SHOW_SEGMENTS: Memory layout at client startup (31 segments, 4 segnames)
--32649:1:aspacem  ( 0) /usr/lib/valgrind/amd64-linux/callgrind
--32649:1:aspacem  ( 1) /home/stuff/stuff/src/xpressive/test1
--32649:1:aspacem  ( 2) /lib/ld-2.9.so
--32649:1:aspacem    0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--32649:1:aspacem    1: file 0000400000-0000419fff  106496 r-x-- d=0x803 i=73705   o=0       (1)
--32649:1:aspacem    2: RSVN 000041a000-0000618fff 2093056 ----- SmFixed
--32649:1:aspacem    3: file 0000619000-000061afff    8192 rw--- d=0x803 i=73705   o=102400  (1)
--32649:1:aspacem    4: RSVN 000061b000-0003ffffff     57m ----- SmFixed
--32649:1:aspacem    5: file 0004000000-000401ffff  131072 r-x-- d=0x804 i=516463  o=0       (2)
--32649:1:aspacem    6:      0004020000-000421efff 2093056
--32649:1:aspacem    7: file 000421f000-0004220fff    8192 rw--- d=0x804 i=516463  o=126976  (2)
--32649:1:aspacem    8: anon 0004221000-0004221fff    4096 rwx--
--32649:1:aspacem    9: RSVN 0004222000-0004a20fff 8384512 ----- SmLower
--32649:1:aspacem   10:      0004a21000-0037ffffff    821m
--32649:1:aspacem   11: FILE 0038000000-003802bfff  180224 r-x-- d=0x804 i=65879   o=0       (0)
--32649:1:aspacem   12: file 003802c000-003802cfff    4096 r-x-- d=0x804 i=65879   o=180224  (0)
--32649:1:aspacem   13: FILE 003802d000-00381c2fff 1662976 r-x-- d=0x804 i=65879   o=184320  (0)
--32649:1:aspacem   14:      00381c3000-00383c1fff 2093056
--32649:1:aspacem   15: FILE 00383c2000-00383c3fff    8192 rw--- d=0x804 i=65879   o=1843200 (0)
--32649:1:aspacem   16: ANON 00383c4000-0038b72fff 8056832 rw---
--32649:1:aspacem   17:      0038b73000-0401ffffff  15508m
--32649:1:aspacem   18: RSVN 0402000000-0402000fff    4096 ----- SmFixed
--32649:1:aspacem   19: ANON 0402001000-0402f59fff     15m rwx--
--32649:1:aspacem   20:      0402f5a000-07fe800fff  16312m
--32649:1:aspacem   21: RSVN 07fe801000-07feffefff 8380416 ----- SmUpper
--32649:1:aspacem   22: anon 07fefff000-07ff000fff    8192 rwx--
--32649:1:aspacem   23:      07ff001000-07ffffffff     15m
--32649:1:aspacem   24: RSVN 0800000000-7fffe7455fff 131039g ----- SmFixed
--32649:1:aspacem   25: ANON 7fffe7456000-7fffe746afff   86016 rw---
--32649:1:aspacem   26: RSVN 7fffe746b000-7fffe75fefff 1654784 ----- SmFixed
--32649:1:aspacem   27: ANON 7fffe75ff000-7fffe75fffff    4096 r-x--
--32649:1:aspacem   28: RSVN 7fffe7600000-ffffffffff5fffff  16383e ----- SmFixed
--32649:1:aspacem   29: ANON ffffffffff600000-ffffffffff600fff    4096 r-x--
--32649:1:aspacem   30: RSVN ffffffffff601000-ffffffffffffffff      9m ----- SmFixed
--32649:1:aspacem  >>>
--32649:1:main     
--32649:1:main     
--32649:1:main     Running thread 1
--32649:1:syswrap- entering VG_(main_thread_wrapper_NORETURN)
--32649:1:aspacem  allocated thread stack at 0x402f5a000 size 81920
--32649:1:syswrap- run_a_thread_NORETURN(tid=1): pre-thread_wrapper
--32649:1:syswrap- thread_wrapper(tid=1): entry
--32649-- Found runtime_resolve (amd64-def): ld-2.9.so +0x14ac0=0x4015550, length 110
--32649:1:transtab allocate sector 0
--32649:1:mallocfr newSuperblock at 0x404E86000 (pszB   65512) owner VALGRIND/ttaux
--32649-- Reading syms from /usr/lib/valgrind/amd64-linux/vgpreload_core.so (0x4a21000)
--32649-- Reading syms from /usr/lib/debug/libstdc++.so.6.0.10 (0x4c23000)
--32649-- Reading syms from /lib/libm-2.9.so (0x4f51000)
--32649-- Reading debug info from /lib/libm-2.9.so ..
--32649-- .. CRC mismatch (computed c14b75b7 wanted 1d0b308f)
--32649-- Reading debug info from /usr/lib/debug/lib/libm-2.9.so ..
--32649:1:mallocfr newSuperblock at 0x404E96000 (pszB 1048552) owner VALGRIND/dinfo
--32649-- Reading syms from /lib/libgcc_s.so.1 (0x51d6000)
--32649-- Reading debug info from /lib/libgcc_s.so.1 ..
--32649-- .. CRC mismatch (computed 5e1e8b97 wanted 48127249)
--32649-- Reading debug info from /usr/lib/debug/lib/libgcc_s.so.1 ..
--32649-- Reading syms from /lib/libc-2.9.so (0x53ee000)
--32649-- Reading debug info from /lib/libc-2.9.so ..
--32649-- .. CRC mismatch (computed bb7f9209 wanted dcd904d7)
--32649-- Reading debug info from /usr/lib/debug/lib/libc-2.9.so ..
--32649:1:mallocfr newSuperblock at 0x404F96000 (pszB 1048552) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x405096000 (pszB 1048552) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x405196000 (pszB 2052072) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x40538B000 (pszB 4100072) owner VALGRIND/dinfo
--32649:1:mallocfr newSuperblock at 0x405774000 (pszB   65512) owner VALGRIND/demangle
--32649-- Symbol match: found runtime_resolve: ld-2.9.so +0x15550=0x4015550
Segmentation fault
Comment 1 Josef Weidendorfer 2009-06-26 20:41:18 UTC
This is a stack underrun in Valgrind demangler; the stack
space reserved for Valgrind tools on AMD64 is 64k, and
this is obviously to small to run Boost programs.

In coregrind/pub_core_aspacemgr.h, set VG_STACK_ACTIVE_SZB
e.g. to 131072. I will leave it to Julian to decide about the size.
Comment 2 Nicholas Nethercote 2009-06-29 06:20:55 UTC
Do you know the name of the stack variable that is being overrun?  Seems like a good candidate for using dynamic memory allocation if possible.
Comment 3 Josef Weidendorfer 2009-06-29 16:35:06 UTC
Some further analysis:
Going up the backtrace in the debugger, the biggest stack allocation
happens in d_demangle_callback, and this one allocates two variable
sized arrays on the stack (either using a GNU extension or alloca):

  di.comps = alloca (di.num_comps * sizeof (*di.comps));
  di.subs = alloca (di.num_subs * sizeof (*di.subs));

For the symbol where I get the segmentation fault, I have on amd64:

 di.num_comps = 1576 with sizeof(*di.comps) = 24
 di.num_subs = 788 with sizeof(*di.subs) = 8

So this needs around 43KB. Together with a deep recursion
(24 recursive calls of d_print_comp with 256 bytes stack frame difference),
the demangler alone needs around 50KB in this case.
BTW, the mangled string in this case already has a length of 788!!
Neverless, as the test case is just simple Boost code, we should
handle it.

I agree that dynamic allocation would be better.
The patch should be simple.
Comment 4 Josef Weidendorfer 2009-06-29 18:21:31 UTC
I just committed a fix for this in r10385, which
replaces the stack allocations mentioned in comment #3
by xmalloc/free pairs (which get replaced by macros
into according VG calls).

This fixes the problem here. If you still see an issue,
please reopen this bug.
Comment 5 Nicholas Nethercote 2009-06-30 03:14:17 UTC
Thanks, Josef!  And a 3.5.0 blocker, too :)
Comment 6 Mishael A Sibiryakov 2009-07-14 01:53:44 UTC
Looks like not fixed. Or something else went wrong.

Sample program:
#include <iostream>
#include <string>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

static const sregex regex  =
        optional(
                ((s1 = +set[alpha | '.' | '_' | digit]) >> '@') |
                ((s1 = +set[alpha | '.' | '_' | digit]) >> ':' >> (s2 = +set[alpha | '.' | '_' | digit]) >> '@')
                )
        >> (s3 = +set[alpha | '.' | '_' | digit])
        >> optional(':' >> (s4 = +_d))
        >> (s5 = '/' >> *~_n);

int main()
{
        return 0;
}

Output of: valgrind --tool=callgrind -v -d ./test
--10265:1:debuglog DebugLog system started by Stage 1, level 1 logging requested
--10265:1:launcher tool 'callgrind' requested
--10265:1:launcher selected platform 'amd64-linux'
--10265:1:launcher launching /usr/lib64/valgrind/callgrind-amd64-linux
--10265:1:debuglog DebugLog system started by Stage 2 (main), level 1 logging requested
--10265:1:main     Welcome to Valgrind version 3.5.0.SVN debug logging
--10265:1:main     Checking current stack is plausible
--10265:1:main     Checking initial stack was noted
--10265:1:main     Starting the address space manager
--10265:1:main     Address space manager is running
--10265:1:main     Starting the dynamic memory manager
--10265:1:mallocfr newSuperblock at 0x402001000 (pszB 4194272) owner VALGRIND/tool
--10265:1:main     Dynamic memory manager is running
--10265:1:main     Initialise m_debuginfo
--10265:1:main     Getting launcher's name ...
--10265:1:main     ... /usr/bin/valgrind
--10265:1:main     Get hardware capabilities ...
--10265:1:main     ... arch = AMD64, hwcaps = amd64-sse3-cx16
--10265:1:main     Getting the working directory at startup
--10265:1:main     ... /work/Development/shit/vgc
--10265:1:main     Split up command line
--10265:1:main     (early_) Process Valgrind's command line options
--10265:1:main     Create initial image
--10265:1:initimg  Loading client
--10265:1:initimg  Setup client env
--10265:1:initimg  Setup client stack: size will be 8388608
--10265:1:initimg  Setup client data (brk) segment
--10265:1:main     Setup file descriptors
--10265:1:main     Create fake /proc/<pid>/cmdline
--10265:1:main     Initialise the tool part 1 (pre_clo_init)
--10265:1:main     Print help and quit, if requested
--10265:1:main     (main_) Process Valgrind's command line options, setup logging
--10265:1:main     Print the preamble...
==10265== Callgrind, a call-graph generating cache profiler.
==10265== Copyright (C) 2002-2009, and GNU GPL'd, by Josef Weidendorfer et al.
==10265== Using LibVEX rev 1908, a library for dynamic binary translation.
==10265== Copyright (C) 2004-2009, and GNU GPL'd, by OpenWorks LLP.
==10265== Using valgrind-3.5.0.SVN, a dynamic binary instrumentation framework.
==10265== Copyright (C) 2000-2009, and GNU GPL'd, by Julian Seward et al.
==10265==
--10265-- Command line
--10265--    ./test
--10265-- Startup, with flags:
--10265--    --tool=callgrind
--10265--    -v
--10265--    -d
--10265-- Contents of /proc/version:
--10265--   Linux version 2.6.29-gentoo-r5 (root@msd) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)) #1 SMP Mon Jun 8 15:40:03 MSD 2009
--10265-- Arch and hwcaps: AMD64, amd64-sse3-cx16
--10265-- Page sizes: currently 4096, max supported 4096
--10265-- Valgrind library directory: /usr/lib64/valgrind
--10265:1:main     ...finished the preamble
--10265:1:main     Initialise the tool part 2 (post_clo_init)
==10265== For interactive control, run 'callgrind_control -h'.
--10265:1:main     Initialise TT/TC
--10265:1:main     Initialise redirects
--10265:1:mallocfr newSuperblock at 0x40247C000 (pszB 1048544) owner VALGRIND/dinfo
--10265:1:main     Load initial debug info
--10265-- Reading syms from /work/Development/shit/vgc/test (0x400000)
--10265:1:mallocfr newSuperblock at 0x40257C000 (pszB 1048544) owner VALGRIND/dinfo
--10265-- Reading syms from /lib64/ld-2.10.1.so (0x4000000)
--10265--    object doesn't have a symbol table
--10265-- Reading syms from /usr/lib64/valgrind/callgrind-amd64-linux (0x38000000)
--10265--    object doesn't have a symbol table
--10265--    object doesn't have a dynamic symbol table
--10265:1:mallocfr newSuperblock at 0x40267C000 (pszB 1048544) owner VALGRIND/dinfo
--10265:1:redir    transfer ownership V -> C of 0x3802b000 .. 0x3802bfff
--10265:1:main     Initialise scheduler (phase 1)
--10265:1:sched    sched_init_phase1
--10265:1:main     Tell tool about initial permissions
--10265:1:main     Initialise scheduler (phase 2)
--10265:1:sched    sched_init_phase2: tid_main=1, cls_end=0x7ff000fff, cls_sz=8388608
--10265:1:main     Finalise initial image
--10265:1:main     Initialise signal management
--10265:1:mallocfr newSuperblock at 0x40277C000 (pszB 1048544) owner VALGRIND/core
--10265:1:main
--10265:1:main
--10265:1:aspacem  <<< SHOW_SEGMENTS: Memory layout at client startup (31 segments, 3 segnames)
--10265:1:aspacem  ( 0) /usr/lib64/valgrind/callgrind-amd64-linux
--10265:1:aspacem  ( 1) /work/Development/shit/vgc/test
--10265:1:aspacem  ( 2) /lib64/ld-2.10.1.so
--10265:1:aspacem    0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--10265:1:aspacem    1: file 0000400000-0000430fff  200704 r-x-- d=0x807 i=7274619 o=0       (1)
--10265:1:aspacem    2: RSVN 0000431000-000062ffff 2093056 ----- SmFixed
--10265:1:aspacem    3: file 0000630000-0000631fff    8192 rw--- d=0x807 i=7274619 o=196608  (1)
--10265:1:aspacem    4: RSVN 0000632000-0003ffffff     57m ----- SmFixed
--10265:1:aspacem    5: file 0004000000-000401bfff  114688 r-x-- d=0x805 i=4375412 o=0       (2)
--10265:1:aspacem    6:      000401c000-000421afff 2093056
--10265:1:aspacem    7: file 000421b000-000421cfff    8192 rw--- d=0x805 i=4375412 o=110592  (2)
--10265:1:aspacem    8: anon 000421d000-000421dfff    4096 rwx--
--10265:1:aspacem    9: RSVN 000421e000-0004a1cfff 8384512 ----- SmLower
--10265:1:aspacem   10:      0004a1d000-0037ffffff    821m
--10265:1:aspacem   11: FILE 0038000000-003802afff  176128 r-x-- d=0x805 i=317918  o=0       (0)
--10265:1:aspacem   12: file 003802b000-003802bfff    4096 r-x-- d=0x805 i=317918  o=176128  (0)
--10265:1:aspacem   13: FILE 003802c000-00381b9fff 1630208 r-x-- d=0x805 i=317918  o=180224  (0)
--10265:1:aspacem   14:      00381ba000-00383b8fff 2093056
--10265:1:aspacem   15: FILE 00383b9000-00383bafff    8192 rw--- d=0x805 i=317918  o=1806336 (0)
--10265:1:aspacem   16: ANON 00383bb000-0038c01fff 8679424 rw---
--10265:1:aspacem   17:      0038c02000-0401ffffff  15507m
--10265:1:aspacem   18: RSVN 0402000000-0402000fff    4096 ----- SmFixed
--10265:1:aspacem   19: ANON 0402001000-040287bfff 8892416 rwx--
--10265:1:aspacem   20:      040287c000-07fe800fff  16319m
--10265:1:aspacem   21: RSVN 07fe801000-07feffdfff 8376320 ----- SmUpper
--10265:1:aspacem   22: anon 07feffe000-07ff000fff   12288 rwx--
--10265:1:aspacem   23:      07ff001000-07ffffffff     15m
--10265:1:aspacem   24: RSVN 0800000000-7fff1cf6cfff 131036g ----- SmFixed
--10265:1:aspacem   25: ANON 7fff1cf6d000-7fff1cf82fff   90112 rw---
--10265:1:aspacem   26: RSVN 7fff1cf83000-7fff1cffcfff  499712 ----- SmFixed
--10265:1:aspacem   27: ANON 7fff1cffd000-7fff1cffdfff    4096 r-x--
--10265:1:aspacem   28: RSVN 7fff1cffe000-ffffffffff5fffff  16383e ----- SmFixed
--10265:1:aspacem   29: ANON ffffffffff600000-ffffffffff600fff    4096 r-x--
--10265:1:aspacem   30: RSVN ffffffffff601000-ffffffffffffffff      9m ----- SmFixed
--10265:1:aspacem  >>>
--10265:1:main
--10265:1:main
--10265:1:main     Running thread 1
--10265:1:syswrap- entering VG_(main_thread_wrapper_NORETURN)
--10265:1:aspacem  allocated thread stack at 0x40287c000 size 81920
--10265:1:syswrap- run_a_thread_NORETURN(tid=1): pre-thread_wrapper
--10265:1:syswrap- thread_wrapper(tid=1): entry
--10265:1:transtab allocate sector 0
--10265:1:mallocfr newSuperblock at 0x4047A8000 (pszB   65504) owner VALGRIND/ttaux
--10265-- Reading syms from /usr/lib64/valgrind/vgpreload_core-amd64-linux.so (0x4a1d000)
--10265--    object doesn't have a symbol table
--10265-- Reading syms from /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6.0.8 (0x4c1f000)
--10265--    object doesn't have a symbol table
--10265-- Reading syms from /lib64/libm-2.10.1.so (0x4f22000)
--10265--    object doesn't have a symbol table
--10265-- Reading syms from /lib64/libgcc_s.so.1 (0x51a3000)
--10265--    object doesn't have a symbol table
--10265-- Reading syms from /lib64/libc-2.10.1.so (0x53b1000)
--10265--    object doesn't have a symbol table
--10265:1:mallocfr newSuperblock at 0x4047B8000 (pszB 1048544) owner VALGRIND/dinfo
--10265:1:mallocfr newSuperblock at 0x4048B8000 (pszB 1048544) owner VALGRIND/dinfo
--10265:1:mallocfr newSuperblock at 0x4049B8000 (pszB   65504) owner VALGRIND/demangle
--10265:1:mallocfr newSuperblock at 0x4049C8000 (pszB   65504) owner VALGRIND/demangle
Segmentation fault

Reproduced on r10440.
Comment 7 Josef Weidendorfer 2009-07-14 14:08:23 UTC
Sorry, I can not reproduce this here with a 64bit x86 machine (at r10451).
At the place of your SEGFAULT, I have no problems (no idea about the CRC
mismatch warnings):

=======================================================================
...
--5382:1:main
--5382:1:main
--5382:1:main     Running thread 1
--5382:1:syswrap- entering VG_(main_thread_wrapper_NORETURN)
--5382:1:aspacem  allocated thread stack at 0x402f5a000 size 81920
--5382:1:syswrap- run_a_thread_NORETURN(tid=1): pre-thread_wrapper
--5382:1:syswrap- thread_wrapper(tid=1): entry
--5382-- Found runtime_resolve (amd64-def): ld-2.7.so +0x13370=0x4013dd0, length 110
--5382:1:transtab allocate sector 0
--5382:1:mallocfr newSuperblock at 0x404E86000 (pszB   65504) owner VALGRIND/ttaux
--5382-- Reading syms from /home/weidendo/x86_64/lib/valgrind/vgpreload_core-amd64-linux.so (0x4a1f000)
--5382-- Reading syms from /usr/lib/libstdc++.so.6.0.9 (0x4c20000)
--5382-- Reading debug info from /usr/lib/libstdc++.so.6.0.9 ..
--5382-- .. CRC mismatch (computed cffb6542 wanted 4e57faa1)
--5382--    object doesn't have a symbol table
--5382-- Reading syms from /lib/libm-2.7.so (0x4f2b000)
--5382-- Reading debug info from /lib/libm-2.7.so ..
--5382-- .. CRC mismatch (computed e491af1c wanted a4e95324)
--5382--    object doesn't have a symbol table
--5382-- Reading syms from /lib/libgcc_s.so.1 (0x51ac000)
--5382-- Reading debug info from /lib/libgcc_s.so.1 ..
--5382-- .. CRC mismatch (computed 068ceb1e wanted 5861faf2)
--5382--    object doesn't have a symbol table
--5382-- Reading syms from /lib/libc-2.7.so (0x53ba000)
--5382-- Reading debug info from /lib/libc-2.7.so ..
--5382-- .. CRC mismatch (computed cb7b9635 wanted 11d14124)
--5382--    object doesn't have a symbol table
--5382-- Symbol match: found runtime_resolve: ld-2.7.so +0x13dd0=0x4013dd0
--5382:1:mallocfr newSuperblock at 0x404E96000 (pszB   65504) owner VALGRIND/demangle
--5382:1:mallocfr newSuperblock at 0x404EA6000 (pszB   65504) owner VALGRIND/demangle
--5382:1:signals  extending a stack base 0x7fefff000 down by 4096
--5382:1:signals  extending a stack base 0x7feffe000 down by 4096
--5382:1:mallocfr newSuperblock at 0x404EB6000 (pszB 4194272) owner VALGRIND/tool
--5382:1:mallocfr newSuperblock at 0x4052B6000 (pszB   65504) owner VALGRIND/demangle
--5382:1:mallocfr newSuperblock at 0x4052C6000 (pszB   73696) owner VALGRIND/demangle
--5382:1:syswrap- thread_wrapper(tid=1): exit
--5382:1:syswrap- run_a_thread_NORETURN(tid=1): post-thread_wrapper
--5382:1:syswrap- run_a_thread_NORETURN(tid=1): last one standing
--5382:1:main     entering VG_(shutdown_actions_NORETURN)
...
=======================================================================

Can you run it in gdb like this? The first line is needed to be able
to call the tool directly. Also, you should not strip your
  /usr/lib64/valgrind/callgrind-amd64-linux

> export VALGRIND_LAUNCHER=foo
> gdb /usr/lib64/valgrind/callgrind-amd64-linux
> r --tool=callgrind -v -d ./test

It is normal that there will be a segfault it this point, something like

 Program received signal SIGSEGV, Segmentation fault.
 0x0000000402fed2a4 in ?? ()

Type "ni" for "next instruction" at the gdb prompt. If you get
something like

 sync_signalhandler (sigNo=11, info=0x402f6bb10, uc=0x402f6b9e0) at m_signals.c:2297

you can type "c". The is the normal segfault handler of Valgrind.
If it does not show "sync_signalhandler", that should be the place
of the real segfault you see in the test case.
I am interested in a backtrace: "bt"

Thanks,
Josef
Comment 8 Mishael A Sibiryakov 2009-07-14 14:23:05 UTC
Hi.
I've tried to check this problem itself (i running gdb directly on valgrind without VALGIND_LAUNCHER and etc. Anyway result is the same :)

It crashed:
Program received signal SIGSEGV, Segmentation fault.
0x0000000038096197 in d_print_comp (dpi=0x403069150, dc=0x404f96030) at m_demangle/cp-demangle.c:3256
3256    m_demangle/cp-demangle.c: No such file or directory.
        in m_demangle/cp-demangle.c
(gdb) p *dc
$2 = {type = DEMANGLE_COMPONENT_NAME, u = {s_name = {
      s = 0x4024f3c37 "boost5proto7exprns_rsINS1_4exprINS0_3tag11shift_rightENS0_7argsns_5args2INS0_6refns_4ref_IKNS3_IS5_NS7_INS9_IKNS3_INS4_11logical_notENS6_5args1INS9_IKNS3_INS4_10bitwise_orENS7_INS9_IKNS3_IS5_NS7_INS9_IKNS3_INS4_6assignENS7_INS9_IKNS3_INS4_8terminalENS6_5args0INS_9xpressive6detail16mark_placeholderEEELl0EEEEENS9_IKNS3_INS4_5positENSB_INS9_IKNS3_INS4_9subscriptENS7_INS9_IKNS3_ISE_NSF_INSH_15set_initializerEEELl0EEEEENS9_IKNS3_ISC_NS7_INS9_IKNS3_ISC_NS7_INS9_IKNS3_ISC_NS7_INS9_IKNS3_ISE_NSF_INSH_25posix_charset_placeholderEEELl0EEEEENS3_ISE_NSF_IRKcEELl0EEEEELl2EEEEES12_EELl2EEEEESY_EELl2EEEEEEELl2EEEEEEELl1EEEEEEELl2EEEEES12_EELl2EEEEENS9_IKNS3_IS5_NS7_INS9_IKNS3_IS5_NS7_IS1U_S1Q_EELl2EEEEES12_EELl2EEEEEEELl2EEEEEEELl1EEEEES1Q_EELl2EEEEENS9_IKNS3_ISA_NSB_INS9_IKNS3_IS5_NS7_IS12_NS9_IKNS3_ISD_NS7_ISM_NS9_IKNS3_ISN_NSB_ISY_EELl1EEEEEEELl2EEEEEEELl2EEEEEEELl1EEEEEEELl2EEENS3_ISD_NS7_ISM_NS9_IKNS3_IS5_NS7_IS12_NS9_IKNS3_INS4_11dereferenceENSB_INS9_IKNS3_INS4_10complementENSB_INS9_IKNS3_ISE_NSF_IcEELl0EEEEEEELl1EEEEEEELl1EEEEEEELl2EEEEEEELl2EEEEEKNS0_6detail10as_expr_ifIS5_KT_KT0_vvE4typeERS3K_RS3M_", len = 5}, s_operator = {op = 0x4024f3c37}, s_extended_operator = {args = 38747191, name = 0xdddddddd00000005},
    s_ctor = {kind = 38747191, name = 0xdddddddd00000005}, s_dtor = {kind = 38747191, name = 0xdddddddd00000005}, s_builtin = {type = 0x4024f3c37},
    s_string = {
      string = 0x4024f3c37 "boost5proto7exprns_rsINS1_4exprINS0_3tag11shift_rightENS0_7argsns_5args2INS0_6refns_4ref_IKNS3_IS5_NS7_INS9_IKNS3_INS4_11logical_notENS6_5args1INS9_IKNS3_INS4_10bitwise_orENS7_INS9_IKNS3_IS5_NS7_INS9_IKNS3_INS4_6assignENS7_INS9_IKNS3_INS4_8terminalENS6_5args0INS_9xpressive6detail16mark_placeholderEEELl0EEEEENS9_IKNS3_INS4_5positENSB_INS9_IKNS3_INS4_9subscriptENS7_INS9_IKNS3_ISE_NSF_INSH_15set_initializerEEELl0EEEEENS9_IKNS3_ISC_NS7_INS9_IKNS3_ISC_NS7_INS9_IKNS3_ISC_NS7_INS9_IKNS3_ISE_NSF_INSH_25posix_charset_placeholderEEELl0EEEEENS3_ISE_NSF_IRKcEELl0EEEEELl2EEEEES12_EELl2EEEEESY_EELl2EEEEEEELl2EEEEEEELl1EEEEEEELl2EEEEES12_EELl2EEEEENS9_IKNS3_IS5_NS7_INS9_IKNS3_IS5_NS7_IS1U_S1Q_EELl2EEEEES12_EELl2EEEEEEELl2EEEEEEELl1EEEEES1Q_EELl2EEEEENS9_IKNS3_ISA_NSB_INS9_IKNS3_IS5_NS7_IS12_NS9_IKNS3_ISD_NS7_ISM_NS9_IKNS3_ISN_NSB_ISY_EELl1EEEEEEELl2EEEEEEELl2EEEEEEELl1EEEEEEELl2EEENS3_ISD_NS7_ISM_NS9_IKNS3_IS5_NS7_IS12_NS9_IKNS3_INS4_11dereferenceENSB_INS9_IKNS3_INS4_10complementENSB_INS9_IKNS3_ISE_NSF_IcEELl0EEEEEEELl1EEEEEEELl1EEEEEEELl2EEEEEEELl2EEEEEKNS0_6detail10as_expr_ifIS5_KT_KT0_vvE4typeERS3K_RS3M_", len = 5}, s_number = {number = 17218616375}, s_character = {character = 38747191}, s_binary = {left = 0x4024f3c37,
      right = 0xdddddddd00000005}}}

(gdb)p *dpi
$3 = {options = 259,
  buf = "xpr<boost::proto::tag::terminal, boost::proto::argsns_::args0<ost::proto::argsns_::args2<boost::proto::refns_::ref_<boost::proto::exprns_::expr<boost::proto::tag::bitwise_or, boost::proto::argsns_::args2<boost::proto::refns_::ref_<boost::proto::exprns_::e", len = 62, last_char = 60 '<',
  callback = 0x38095b25 <d_growable_string_callback_adapter>, opaque = 0x403069340, templates = 0x0, modifiers = 0x0, demangle_failure = 0, pack_index = 4}

Also i've changed VG_STACK_ACTIVE_SZB to 262144 - no luck. Number of frames in the stack don't changed after this as i expected. (But i really don't known what this define do)

And another strange thing (in the top frame):
(gdb) p *dc->u.s_binary.left
$6 = {type = 1936682850, u = {s_name = {s = 0x72707865376f746f <Address 0x72707865376f746f out of bounds>, len = 1918858094}, s_operator = {
      op = 0x72707865376f746f}, s_extended_operator = {args = 930051183, name = 0x534e4973725f736e}, s_ctor = {kind = 930051183,
      name = 0x534e4973725f736e}, s_dtor = {kind = 930051183, name = 0x534e4973725f736e}, s_builtin = {type = 0x72707865376f746f}, s_string = {
      string = 0x72707865376f746f <Address 0x72707865376f746f out of bounds>, len = 1918858094}, s_number = {number = 8246223293832459375}, s_character = {
      character = 930051183}, s_binary = {left = 0x72707865376f746f, right = 0x534e4973725f736e}}}

Maybe this is normal, don't known.
It's first time when i am looking inside of valgrind.
Comment 9 Mishael A Sibiryakov 2009-07-14 14:33:02 UTC
Created attachment 35313 [details]
Complete run log
Comment 10 Mishael A Sibiryakov 2009-07-14 14:33:56 UTC
bt full with 'set print elements 0' attached.
Comment 11 Josef Weidendorfer 2009-07-14 14:48:43 UTC
> Hi.
> I've tried to check this problem itself (i running gdb directly on valgrind
> without VALGIND_LAUNCHER and etc. Anyway result is the same :)

Interesting. I did not know that gdb follows exec. Anyway...

> It crashed:
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000038096197 in d_print_comp (dpi=0x403069150, dc=0x404f96030) at
m_demangle/cp-demangle.c:3256

> ... backtrace attached

Quite a few frames recursively in "d_print_comp".
That exactly looks like the problem I debugged.
As mentioned above, this was a stack underrun.

What is the value of the stack pointer esp ("info regs") ?
Can you compare esp against the stack area shown for this process
in /proc/.../maps ?
How large is the stack area (it should change with VG_STACK_ACTIVE_SZB)?
Comment 12 Mishael A Sibiryakov 2009-07-14 15:02:26 UTC
Looks like a stack overrun

(gdb)i registers
rax            0x1      1
rbx            0x404f96498      17263322264
rcx            0x0      0
rdx            0x1      1
rsi            0x404f96030      17263321136
rdi            0x403069150      17230631248
rbp            0x403069150      0x403069150
> rsp            0x40305c000      0x40305c000
r8             0x404fa7741      17263392577
r9             0x404fa7642      17263392322
r10            0x8      8
r11            0x0      0
r12            0x404f96468      17263322216
r13            0x3817fe91       941096593
r14            0x2      2
r15            0x0      0
rip            0x38096197       0x38096197 <d_print_comp+2>
eflags         0x10297  [ CF PF AF SF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x27f    639
fstat          0x0      0
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
mxcsr          0x1f80   [ IM DM ZM OM UM PM ]

00400000-00431000 r-xp 00000000 08:07 7274619                            /work/Development/vgc/test
00630000-00631000 r--p 00030000 08:07 7274619                            /work/Development/vgc/test
00631000-00632000 rw-p 00031000 08:07 7274619                            /work/Development/vgc/test
04000000-0401c000 r-xp 00000000 08:05 4375412                            /lib64/ld-2.10.1.so
0401c000-0401e000 rw-p 0401c000 00:00 0
0404b000-0404d000 rw-p 0404b000 00:00 0
0421b000-0421c000 r--p 0001b000 08:05 4375412                            /lib64/ld-2.10.1.so
0421c000-0421d000 rw-p 0001c000 08:05 4375412                            /lib64/ld-2.10.1.so
0421d000-0421e000 rwxp 0421d000 00:00 0
04a1d000-04a1e000 r-xp 00000000 08:05 106605                             /usr/lib64/valgrind/vgpreload_core-amd64-linux.so
04a1e000-04c1d000 ---p 00001000 08:05 106605                             /usr/lib64/valgrind/vgpreload_core-amd64-linux.so
04c1d000-04c1e000 r--p 00000000 08:05 106605                             /usr/lib64/valgrind/vgpreload_core-amd64-linux.so
04c1e000-04c1f000 rw-p 00001000 08:05 106605                             /usr/lib64/valgrind/vgpreload_core-amd64-linux.so
04c1f000-04d07000 r-xp 00000000 08:05 606264                             /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6.0.8
04d07000-04f07000 ---p 000e8000 08:05 606264                             /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6.0.8
04f07000-04f0e000 r--p 000e8000 08:05 606264                             /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6.0.8
04f0e000-04f10000 rw-p 000ef000 08:05 606264                             /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6.0.8
04f10000-04f22000 rw-p 04f10000 00:00 0
04f22000-04fa2000 r-xp 00000000 08:05 4375410                            /lib64/libm-2.10.1.so
04fa2000-051a1000 ---p 00080000 08:05 4375410                            /lib64/libm-2.10.1.so
051a1000-051a2000 r--p 0007f000 08:05 4375410                            /lib64/libm-2.10.1.so
051a2000-051a3000 rw-p 00080000 08:05 4375410                            /lib64/libm-2.10.1.so
051a3000-051b0000 r-xp 00000000 08:05 5243031                            /lib64/libgcc_s.so.1
051b0000-053af000 ---p 0000d000 08:05 5243031                            /lib64/libgcc_s.so.1
053af000-053b0000 r--p 0000c000 08:05 5243031                            /lib64/libgcc_s.so.1
053b0000-053b1000 rw-p 0000d000 08:05 5243031                            /lib64/libgcc_s.so.1
053b1000-054f6000 r-xp 00000000 08:05 4375420                            /lib64/libc-2.10.1.so
054f6000-056f6000 ---p 00145000 08:05 4375420                            /lib64/libc-2.10.1.so
056f6000-056fa000 r--p 00145000 08:05 4375420                            /lib64/libc-2.10.1.so
056fa000-056fb000 rw-p 00149000 08:05 4375420                            /lib64/libc-2.10.1.so
056fb000-05700000 rw-p 056fb000 00:00 0
38000000-381ba000 r-xp 00000000 08:05 136898                             /usr/lib64/valgrind/callgrind-amd64-linux
383b9000-383bb000 rw-p 001b9000 08:05 136898                             /usr/lib64/valgrind/callgrind-amd64-linux
383bb000-38c02000 rw-p 383bb000 00:00 0                                  [heap]
402001000-40305a000 rwxp 402001000 00:00 0
> 40305a000-40305c000 ---p 40305a000 00:00 0
40305c000-40306c000 rwxp 40305c000 00:00 0
40306c000-40306e000 ---p 40306c000 00:00 0
40306e000-404fb6000 rwxp 40306e000 00:00 0
7feffe000-7ff001000 rwxp 7feffe000 00:00 0
7fff03a15000-7fff03a2b000 rw-p 7ffffffe9000 00:00 0                      [stack]
7fff03bfe000-7fff03bff000 r-xp 7fff03bfe000 00:00 0                      [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

VG_STACK_ACTIVE_SZB now is 262144
Comment 13 Mishael A Sibiryakov 2009-07-14 15:14:20 UTC
With VG_STACK_ACTIVE_SZB PAGE_SIZE * 512 - same crash in the same place.
Maps and registers looks the same.
Comment 14 Josef Weidendorfer 2009-07-14 15:24:29 UTC
This looks exactly like before the bug fix.
And changing VG_STACK_ACTIVE_SZB should change the stack space for the
tool...

Are you really sure that /usr/lib64/valgrind/callgrind-amd64-linux
is the newly compiled one (eg. checking the date of this file)?

Anyway: can you go up the stack to check where this huge stack
allocation comes from (compare rsp of the different stack frames)?
It should not be "d_demangle_callback", as that is exactly the place
I fixed...
Comment 15 Mishael A Sibiryakov 2009-07-14 15:37:06 UTC
Stack allocations:

(gdb) frame 0
#0  0x0000000038096197 in d_print_comp (dpi=0x403069150, dc=0x404f96030) at m_demangle/cp-demangle.c:3256
3256    in m_demangle/cp-demangle.c
(gdb) i registers rsp
rsp            0x40305c000      0x40305c000
(gdb) up
#1  0x000000003809623b in d_print_comp (dpi=0x403069150, dc=0x404f96498) at m_demangle/cp-demangle.c:3276
3276    in m_demangle/cp-demangle.c
(gdb) i registers rsp
rsp            0x40305c010      0x40305c010
(gdb) up
#2  0x000000003809623b in d_print_comp (dpi=0x403069150, dc=0x404f964c8) at m_demangle/cp-demangle.c:3276
3276    in m_demangle/cp-demangle.c
(gdb) i registers rsp
rsp            0x40305c1c0      0x40305c1c0
Comment 16 Mishael A Sibiryakov 2009-07-14 15:56:31 UTC
Damn, i was blind... I've changed stack size for the ppc64. Objdump saves me :)

Now stack size is correct (i've set it to PAGE_SIZE * 512 - 0x200000)

000000003801d390 <_start>:
    3801d390:   48 c7 c7 c0 60 95 38    mov    $0x389560c0,%rdi
    3801d397:   48 81 c7 00 20 00 00    add    $0x2000,%rdi
>    3801d39e:   48 81 c7 00 00 20 00    add    $0x200000,%rdi
    3801d3a5:   48 83 e7 f0             and    $0xfffffffffffffff0,%rdi
    3801d3a9:   48 87 fc                xchg   %rdi,%rsp
    3801d3ac:   e8 97 39 00 00          callq  38020d48 <_start_in_C_linux>
    3801d3b1:   f4                      hlt

And everything is works fine. Except gdb :)
Running without gdb is working perfectly, but under the gdb:
--27629:1:mallocfr newSuperblock at 0x4051B6000 (pszB   69600) owner
VALGRIND/demangle

Program received signal SIGSEGV, Segmentation fault.
0x00000004032edc82 in ?? ()
(gdb) bt
#0  0x00000004032edc82 in ?? ()
#1  0x000000000000cac7 in ?? ()
#2  0x0000000038b5bc80 in vgPlain_threads ()
#3  0x0000000000000000 in ?? ()
(gdb)
Comment 17 Josef Weidendorfer 2009-07-14 16:35:13 UTC
Ah, thanks. I just found that there is still a place in
coregrind/m_demangle/cp-demangle.c where dynamic arrays are
used. But your backtrace does not touch that place.

So it looks like the demangler can get to very deep recursion
levels with these symbols from the Boost library, and even
without dynamic arrays allocated on the stack, the 64k stack
space is not enough... your example shows a recursion depth
of 124 for d_print_comp at the time of the stack overrun...

Hmm.. seems no way around increasing the stack space.
But as it looks, this Boost xpressive library can produce
arbitrarily long symbol names (depending on the complexity
of the grammar?), so we will always hit this bug at one
point. Not good :(
Comment 18 Mishael A Sibiryakov 2009-07-14 16:56:54 UTC
I think your right about xpressive. As i known xpressive heavily use templates and in some cases symbols can be madley huge.
Stack size limited to the ~2megs is not clever solution. I think this is not a problem to write code which exhaust this limit, maybe xpressive tests in the boost library can do that.
Actually i think you need to avoid of recursion. Maybe loop or something like. As i understood recursion mainly used for the 'struct d_print_info' filling.

Anyway thanks for your help. If you need, i can do some work on this issue.
Comment 19 Josef Weidendorfer 2009-07-14 17:34:35 UTC
IMHO rewriting the demangler from recursive to iterative style is
out-of-question: as far as I know, we just copied this code over
from binutils, and if there are new demangler types to support, it
would be good to do this again. So it is better to not diverge
to much from upstream here.

On the other hand, enlarging the tool stack size to 2MB on 64bit
should be fine. For larger symbols, it would be good to catch
a stack overrun and print a warning.
Comment 20 Mishael A Sibiryakov 2009-07-14 17:36:58 UTC
Agreed.
Comment 21 Nicholas Nethercote 2009-08-11 02:02:21 UTC
The original problem was fixed, and the discussion from comment 7 onwards was unclear (to me, at least) as to whether there was still a real problem.  So I will mark this as fixed.  If similar problems are encountered again, please open a new bug.  Thanks.
Comment 22 Josef Weidendorfer 2009-08-11 11:09:25 UTC
Hmm... while the original test case now works, the bug report is still
valid, as the second test case shows: there is a stack overrun of the
tool's stack space because of demangling a huge symbol; and the crash
report does not really give a hint how to solve this.

However, as the Boost library obviously can create symbols almost
unbound in size with above parser library, even enlarging the stack
from 64k to 1MB (or something like that) will only solve the problem
for some test cases. A solution would be one of:
* Heavily enlarge the stack space (to 4/8MB ?), and ignore the issue
* Check whether VG crashes in the demangler, and suggest to the
user to increase tool stack size & recompile.
* Get rid of demangling in VG alltogether. However, in Callgrind I
allow to trigger actions depending on user specified symbols. For
regular sized symbols this is a valid feature & needs "online" demangling
* Rewrite the demangler into iterative style, using heap space as needed
* Rewrite the demangler to stop demangling when a given resulting
length was reached.
* Or just write a FAQ entry: "Expect VG to crash with huge C++ symbols
(as seen with Boost), see bug 197988", and mark the bug as "WontFix".

Reopened for now.
Comment 23 Mishael A Sibiryakov 2009-08-11 11:38:34 UTC
Now i use VG with 2mb stack size and it works in my case. But i think the best (fastest and easy) choice is the:
* Increase stack size to 1-2mb
* Check for the recursion level and stop demangling on very large symbols (recursion level depend on the stack size)

Also you can show warning to the user in that case.

PS: Imho Recompile and FAQ variants are terrible, because after the update you must fix this problem again and again..
Comment 24 Nicholas Nethercote 2009-08-12 00:46:24 UTC
First question: is the bug still visible in a real program, or do we just know it could happen?

> * Increase stack size to 1-2mb

Is that by increasing VG_STACK_ACTIVE_SZB?

> * Check for the recursion level and stop demangling on very large symbols
> (recursion level depend on the stack size)
> 
> Also you can show warning to the user in that case.
> 
> PS: Imho Recompile and FAQ variants are terrible, because after the update you
> must fix this problem again and again..

Josef, what's the simplest fix that you would be happy with -- would checking the recursion level be fairly simple?  Could we decrease the size of the stack frame?

We really want to resolve this in the next 2 days so Julian can create a release candidate on Friday.
Comment 25 Mishael A Sibiryakov 2009-08-12 08:24:14 UTC
> > * Increase stack size to 1-2mb
> 
> Is that by increasing VG_STACK_ACTIVE_SZB?

Yep.

$ svn st
?       tags
X       VEX
M       coregrind/pub_core_aspacemgr.h

Performing status on external item at 'VEX'
$ svn di
Index: coregrind/pub_core_aspacemgr.h
===================================================================
--- coregrind/pub_core_aspacemgr.h      (revision 10440)
+++ coregrind/pub_core_aspacemgr.h      (working copy)
@@ -376,7 +376,7 @@
 # define VG_STACK_ACTIVE_SZB 131072 // 2 or 32 pages
 #else
 # define VG_STACK_GUARD_SZB  8192   // 2 pages
-# define VG_STACK_ACTIVE_SZB 65536  // 16 pages
+# define VG_STACK_ACTIVE_SZB (PAGE_SIZE * 512) // 2Mb
 #endif

 typedef
$
Comment 26 Julian Seward 2009-08-13 01:14:09 UTC
Probably the simplest thing is to simply change the stack size
to (eg) 256k, and run with that for a while.
Comment 27 Mishael A Sibiryakov 2009-08-13 01:30:22 UTC
Why so small ?
Stack is allocated at once and 2M is more adequate than 256k. Besides, now is difficult to find host with less than 1Gb of ram.

Nobody dies from 2megs. I think.
Comment 28 Nicholas Nethercote 2009-08-13 01:34:21 UTC
Yes, and 640kB of RAM should be enough for anyone...
Comment 29 Mishael A Sibiryakov 2009-08-13 01:49:11 UTC
This is a rhetorical question. But at this moment i have at least one program which is require this stack size in callgrind.

Anyway, this useless debate, and you have my opinion.
Comment 30 Nicholas Nethercote 2009-08-14 01:29:33 UTC
Downgrading from blocker3.5.0 to wanted3.5.1.  Julian, please increase the stack size if you are comfortable with that.
Comment 31 Julian Seward 2009-08-15 02:11:22 UTC
I propose to back out r10385, as it doesn't fix the problem as far
as I can tell from the discussion, set the stack size to 256k, and
leave it at that.

Surely /usr/bin/{as,ld} cannot handle arbitrarily long symbols either.
I don't imagine this problem is unique to Valgrind.
Comment 32 Josef Weidendorfer 2009-08-17 17:52:53 UTC
For 3.5.0, I am fine with that.
Comment 33 Julian Seward 2009-08-17 18:39:00 UTC
Backed out (r10837).
Comment 34 Nicholas Nethercote 2009-08-21 00:52:23 UTC
Bug 204572 might be a dup of this one.
Comment 35 Alex Ivershen 2009-08-25 00:43:51 UTC
*** Bug 204572 has been marked as a duplicate of this bug. ***
Comment 36 Alex Ivershen 2009-08-25 00:45:46 UTC
Update from bug 204572. Increasing VG_STACK_ACTIVE_SZB to 2 Mb fixed the issue for me as well.
Comment 37 Josef Weidendorfer 2009-12-08 17:39:52 UTC
Bug 217863 might be a duplicate of this bug.
Comment 38 Andrew C. Morrow 2010-03-05 23:59:07 UTC
FWIW, one simple but ugly way to work around this issue, if it is affecting you, is to run callgrind with --demangle=no on the command line, and then pre-process the resulting callgrind.out file through c++filt (or the equivalent) before giving it to kcachegrind.

This seems to work fine.
Comment 39 Josef Weidendorfer 2010-03-08 15:33:43 UTC
[answer to comment #38]

Yes, switching off the demangler in Callgrind is a good strategy as
work-around. If Callgrind would be able to register an action on
stack underruns in the demangler, we could output this work-around
as tip, and IMHO this would be enough to close this bug.

However, one has to understand the side effect that any configured actions
for entering/leaving a function (--dump-before=<func>, --zero-before=<func>)
eventually need to be adjusted to match mangled symbols.
Comment 40 Diggory Hardy 2010-05-07 18:14:02 UTC
Think I've hit the same issue (in 3.5.0). Didn't try rebuilding valgrind, but at any rate, turning off name demangling helps.

(This is with an open source program, so if anyone wants to try reproducing, I'll post a guide.)
Comment 41 Stephen Pope 2010-05-26 00:44:00 UTC
I have also run into this same problem with 3.5.0, on other boost symbols (not xpressive, but with boost::mpl symbols). Too many levels of recursion in the demangling of ugly long symbols. Bumping the stack size (VG_STACK_ACTIVE_SZB) up to 2M has fixed it.
Comment 42 Julian Seward 2010-05-26 01:01:56 UTC
Yes, need to resolve this properly for 3.6.0.
Comment 43 Julian Seward 2010-05-26 01:02:52 UTC
(In reply to comment #41)
> demangling of ugly long symbols. Bumping the stack size (VG_STACK_ACTIVE_SZB)
> up to 2M has fixed it.

Is that literally the only change you did?  Can you send the patch?
Comment 44 Stephen Pope 2010-05-26 19:57:49 UTC
Yes. I changed one line, recompiled, and reinstalled:

% diff -u coregrind/pub_core_aspacemgr.h.orig coregrind/pub_core_aspacemgr.h
--- coregrind/pub_core_aspacemgr.h.orig 2009-08-19 07:37:47.000000000 -0600
+++ coregrind/pub_core_aspacemgr.h      2010-05-25 16:02:40.607250000 -0600
@@ -376,7 +376,7 @@
 # define VG_STACK_ACTIVE_SZB 131072 // 2 or 32 pages
 #else
 # define VG_STACK_GUARD_SZB  8192   // 2 pages
-# define VG_STACK_ACTIVE_SZB 65536  // 16 pages
+# define VG_STACK_ACTIVE_SZB (4096 * 512) // 2Mb
 #endif
 
 typedef
Comment 45 yabo 2010-07-17 15:01:06 UTC
I ran into this issue with valgrind 3.5.0 and r11214 with g++ 4.5.0 on AMD64 with code heavily using templates.

By the way, the initial code using XPressive (Boost 1.43) still produces a segmentation fault when built with '-g' flag on the same platform using valgrind 3.5.0 or r11214.

Using the patch given in #44 solves both problems.
Comment 46 Julian Seward 2010-07-21 11:51:06 UTC
I increased the stack size to 1MB in r11215.  Hopefully that
is enough to solve most of these problems.  I don't want to 
throw huge stacks (2MB+) at the problem because some targets
only have 512MB of memory.

Closing.  If this problem is still happening to people, please
re-open.
Comment 47 Julian Seward 2010-07-21 12:59:01 UTC
*** Bug 240488 has been marked as a duplicate of this bug. ***
Comment 48 Julian Seward 2010-07-30 17:30:51 UTC
*** Bug 217863 has been marked as a duplicate of this bug. ***
Comment 49 Karmaqtrp 2021-05-08 14:47:51 UTC
(In reply to Mishael A Sibiryakov from comment #6)
> CLOSES like not fixed. Or something else went wrong.
> 
> Sample program:
> #CLOSES <iostream>
> #CLOSES <string>
> #CLOSES <boost/xpressive/xpressive.hpp>
> 
> using namespace run boost::xpressive;
> 
> --10265--    object doesn't 
> 
> CLOSES  r10440.
Comment 50 Karmaqtrp 2021-05-08 14:48:21 UTC
CLOSES
Comment 51 Karmaqtrp 2021-05-08 14:50:39 UTC
(In reply to pistmaster from CLOSES
> 
> Example program to reproduce: CLOSES
>