432510 – RFE: ENOMEM fault injection mode

Bug 432510 - RFE: ENOMEM fault injection mode

Summary: RFE: ENOMEM fault injection mode

Status:	REPORTED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (other bugs)
Version First Reported In:	unspecified
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-04 15:53 UTC by Frank Ch. Eigler
Modified:	2021-02-15 20:50 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Frank Ch. Eigler 2021-02-04 15:53:04 UTC

It would be helpful to test programs' resilience to ENOMEM conditions under valgrind.  While testing code under /usr/bin/prlimit -d=XXXX can help, valgrind relaxes these ulimits for itself and for the emulated processes.  That hides this class of problem.

It would be nice if the intercepted malloc operations can be configured to fail at conditions such as a particular total allocation, or randomly with some probability.

Comment 1 Mark Wielaard 2021-02-10 22:40:48 UTC

This is a nice idea.

One tricky thing is that although the malloc-like functions can return NULL they don't actually set errno. See https://bugs.kde.org/show_bug.cgi?id=217695 this may or may not be a problem for this idea.

There is a brief description on how the malloc replacement functions work at ./docs/internals/m_replacemalloc.txt

I am not sure what the right "level" is to implement this.
The choices are:

- ./coregrind/m_replacemalloc/vg_replace_malloc.c
  This is where the malloc-like functions are intercepted.
  The code runs as part of the program (is LD_PRELOADED). Which means you
  cannot easily call into the valgrind core. But you can simply replace
  any malloc-like call with a (random) return of NULL which would then
  work with any valgrind tool that uses malloc replacement.

- In a specific tool like in memcheck/mc_malloc_wrappers.c
  See the VG_(needs_malloc_replacement) call in memcheck/mc_main.c
  to see which malloc like functions are wrapped
  (MC_(malloc), MC_(calloc), MC_(realloc), etc.) most of these
  call into MC_(new_block). Which you could conditionally return NULL
  based on the tool (memcheck in this case) state.

Comment 2 Julian Seward 2021-02-11 13:24:36 UTC

(In reply to Frank Ch. Eigler from comment #0)

Before we get into discussing how to implement stuff, I think it's important to have a more precise statement of what functionality is required.  Can you give more details on what errors should be triggered, how they should be reported, and when they should be triggered?

Comment 3 Frank Ch. Eigler 2021-02-11 19:39:01 UTC

The initial use case was hinted at in #c0: to approximate a limited heap to a program.

> Can you give more details on what errors should be triggered, how they should be reported, and when they should be triggered?

I'm thinking valgrind could grow a new option for memcheck, --heapsize=<number>, which would be a limit of the total outstanding malloc-like allocations at any given time.  valgrind would return NULL/ENOMEM to the application when the limit is hit.  At the 30,000 ft level that'd be enough for my purposes.  The idea is not to have valgrind detect this as though it were an error condition (like a use-after-free or something), but to let the app react to an artificial error.

It's complicated by mmap'd shared libraries, mmap(PRIVATE), sbrk(), ... but I believe just focusing on the heap alloc/free hook points valgrind tracks would be super useful for now.

Comment 4 Philippe Waroquiers 2021-02-15 20:50:40 UTC

To have a flexible way to specify when/where a memory allocation should fail,
we might use a something that re-uses (part of) the suppression infrastructure:

The user would give a file with 'suppression-like' entries, but instead of
suppressing errors, these entries would put a limit (in nr of allocated blocks
and/or nr of allocated bytes) after which a malloc would return NULL.

That should be relatively cpu-cheap to implement, as the matching between
the alloc stacktrace and the 'heap-limit supp entries' has to be done
only the first time a new stacktrace is stored in the list of stack traces.