368507 – valgrind throws std::bad_alloc on memory allocations larger than 34255421416 bytes

Bug 368507 - valgrind throws std::bad_alloc on memory allocations larger than 34255421416 bytes

Summary: valgrind throws std::bad_alloc on memory allocations larger than 34255421416 ...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (other bugs)
Version First Reported In:	3.10.0
Platform:	RedHat Enterprise Linux Linux

Importance:	NOR crash
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-09-09 15:44 UTC by tgray26
Modified:	2017-05-23 03:29 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
test case (allocgig.c) (398 bytes, text/plain) 2017-05-15 08:59 UTC, Julian Seward	Details
Proposed fix (so far, Linux only) (2.99 KB, patch) 2017-05-15 09:01 UTC, Julian Seward	Details
Proposed fix (so far, Linux and Solaris only) (4.39 KB, patch) 2017-05-15 14:59 UTC, Ivo Raisr	Details
Show Obsolete (1) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description tgray26 2016-09-09 15:44:09 UTC

Memory allocations of 34255421417 and higher cause a std::bad_alloc exception to be thrown when running a process with valgrind even though the process runs fine on it's own. This is not a memory overhead issue since I've run this on a machine with >250GB free memory and a machine with <50GB free memory and had the same results. If the allocation is lowered to 34255421416 the exception is not thrown.

-bash-4.2$ cat test.cpp
/* Hello World program */
#include<iostream>
#include <stdio.h>
#include <unistd.h>
using namespace std;

typedef struct test test;

struct test {
  char arr[34255421417] = {};
};

int main()
{ 
  cout << "Before Allocation\n";
  try{
    test * t1 = new test;
  }catch(const std::exception &exc){
    std::string exception = exc.what();
    std::cerr << "Caught exception: " + exception + "\n";

  }
  cout << "After Allocation\n";
  cin.get();
  return 0;
}
-bash-4.2$ g++ -std=c++11 test.cpp
-bash-4.2$ valgrind --tool=cachegrind ./a.out
==43814== Cachegrind, a cache and branch-prediction profiler
==43814== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
==43814== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==43814== Command: ./a.out
==43814==
--43814-- warning: L3 cache found, using its data for the LL simulation.
Before Allocation
Caught exception: std::bad_alloc
After Allocation

==43814==
==43814== I   refs:      1,477,163
==43814== I1  misses:        1,756
==43814== LLi misses:        1,683
==43814== I1  miss rate:      0.11%
==43814== LLi miss rate:      0.11%
==43814==
==43814== D   refs:        499,289  (371,405 rd   + 127,884 wr)
==43814== D1  misses:       12,494  ( 10,762 rd   +   1,732 wr)
==43814== LLd misses:        7,472  (  6,209 rd   +   1,263 wr)
==43814== D1  miss rate:       2.5% (    2.8%     +     1.3%  )
==43814== LLd miss rate:       1.4% (    1.6%     +     0.9%  )
==43814==
==43814== LL refs:          14,250  ( 12,518 rd   +   1,732 wr)
==43814== LL misses:         9,155  (  7,892 rd   +   1,263 wr)
==43814== LL miss rate:        0.4% (    0.4%     +     0.9%  )
-bash-4.2$ ./a.out
Before Allocation
After Allocation

-bash-4.2$ free -g
              total        used        free      shared  buff/cache   available
Mem:            251           2          33           0         215         248
Swap:             3           0           3
-bash-4.2$ valgrind --version
valgrind-3.10.0

Comment 1 Tom Hughes 2016-09-09 16:20:59 UTC

Yes there is a compiled in memory limit imposed by the addressing scheme valgrind uses for it's shadow memory. See https://stackoverflow.com/questions/8644234/why-is-valgrind-limited-to-32-gb-on-64-bit-architectures for more information and how to patch valgrind to allow larger address spaces.

Comment 2 Julian Seward 2016-09-15 10:35:40 UTC

In the trunk right now we have N_PRIMARY_BITS = 20, which according to the svn
log makes the maximum usable memory amount be 64G.  That was done at end-Jan
2013 and should surely be in 3.10 and later.

Maybe we should bump this up to 21 bits, hence giving 128G usable memory on
64 bit targets?  It would slow down startup a bit because that array needs to be
zeroed out, and would soak up a bit more memory, but otherwise seems harmless.
Presumably at some point we can outrun (the ever decelerating) Moore's law
with this game ;-)

Comment 3 Julian Seward 2016-09-15 10:36:38 UTC

The primary_map array, I mean.   I didn't mean the whole 128GB needs to be
zeroed out at startup.

Comment 4 Ivo Raisr 2016-09-15 11:47:48 UTC

I fully agree.
Server systems these days have even TBs of memory to play with.

In addition to initializing primary map, N_PRIMARY_BITS come into play also in mc_expensive_sanity_check(). Hopefully it won't be a big deal.

Comment 5 Julian Seward 2016-10-19 16:32:11 UTC

I had hoped to do this for 3.12.0, but after looking at the #ifdef swamp
in VG_(am_startup) that sets aspacem_maxAddr, I think it is too risky,
because of the number of different cases that need to be verified.
So I'd propose to leave it till after the release.  The number of users
that this will affect is tiny and those that really need it in 3.12.x can
cherry pick the trunk commit into their own custom 3.12.x build, once
we fix it on the trunk.

Comment 6 Julian Seward 2017-05-15 08:59:16 UTC

Created attachment 105548 [details]
test case (allocgig.c)

Comment 7 Julian Seward 2017-05-15 09:01:22 UTC

Created attachment 105549 [details]
Proposed fix (so far, Linux only)

Comment 8 Tom Hughes 2017-05-15 09:24:38 UTC

Fails at 32Gb without patch:

trying for 31 GB ..
==12078== Warning: set address range perms: large range [0x3960c040, 0x7f960c040) (defined)
  .. OK
==12078== Warning: set address range perms: large range [0x3960c028, 0x7f960c058) (noaccess)
trying for 32 GB ..
allocgig: allocgig.c:15: main: Assertion `p' failed.
==12078== 
==12078== Process terminating with default action of signal 6 (SIGABRT)
==12078==    at 0x4E6E428: raise (raise.c:54)
==12078==    by 0x4E70029: abort (abort.c:89)
==12078==    by 0x4E66BD6: __assert_fail_base (assert.c:92)
==12078==    by 0x4E66C81: __assert_fail (assert.c:101)
==12078==    by 0x400700: main (in /home/tomh/allocgig)

and at 64Gb with the patch:

trying for 63 GB ..
==14020== Warning: set address range perms: large range [0x10060c4040, 0x1fc60c4040) (defined)
  .. OK
==14020== Warning: set address range perms: large range [0x10060c4028, 0x1fc60c4058) (noaccess)
trying for 64 GB ..
allocgig: allocgig.c:15: main: Assertion `p' failed.
==14020== 
==14020== Process terminating with default action of signal 6 (SIGABRT)
==14020==    at 0x4E6E428: raise (raise.c:54)
==14020==    by 0x4E70029: abort (abort.c:89)
==14020==    by 0x4E66BD6: __assert_fail_base (assert.c:92)
==14020==    by 0x4E66C81: __assert_fail (assert.c:101)
==14020==    by 0x400700: main (in /home/tomh/allocgig)

Comment 9 Ivo Raisr 2017-05-15 14:59:59 UTC

Created attachment 105561 [details]
Proposed fix (so far, Linux and Solaris only)

Solaris changes.

Comment 10 Ivo Raisr 2017-05-15 15:01:22 UTC

Unfortunately I do not have a machine with >32 GB of physical memory where I can install Solaris and try it out. Solaris does not overcommit when allocating memory.

Regression tests passed ok.

Comment 11 Julian Seward 2017-05-16 06:22:23 UTC

Solaris and Linux limit increased to 128GB in r16381.  OSX is
so far unchanged.

Rhys, do you want to change OSX too?  I think nothing will break
if OSX isn't changed.  So, as you like.

Comment 12 Julian Seward 2017-05-22 07:41:33 UTC

Closing, for now.  If we need to change the OSX limits later then,
well, fine.

Comment 13 Rhys Kidd 2017-05-23 03:29:16 UTC

Any changes for macOS can come later.