Bug 378732 - False positive "bad free" on recent versions of bash
Summary: False positive "bad free" on recent versions of bash
Status: RESOLVED INTENTIONAL
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.12.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-13 08:14 UTC by Reuben Thomas
Modified: 2017-04-14 21:14 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Reuben Thomas 2017-04-13 08:14:48 UTC
Running bash 4.3 as shipped in Ubuntu 16.04, or bash 4.4 or current git master head compiled from source on Ubuntu 16.04, I get:

$ valgrind bash -c 'exit 0'
==30344== Memcheck, a memory error detector
==30344== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30344== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==30344== Command: bash -c exit\ 0
==30344== 
==30344== Invalid free() / delete / delete[] / realloc()
==30344==    at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30344==    by 0x45E1D0: unwind_frame_run_internal.constprop.3 (unwind_prot.c:301)
==30344==    by 0x45E37F: without_interrupts (unwind_prot.c:107)
==30344==    by 0x45E37F: run_unwind_frame (unwind_prot.c:135)
==30344==    by 0x47B664: parse_and_execute (evalstring.c:421)
==30344==    by 0x4209D6: run_one_command (shell.c:1348)
==30344==    by 0x41F893: main (shell.c:695)
==30344==  Address 0x423b6e8 is in the brk data segment 0x4228000-0x423bfff
==30344== 
==30344== 
==30344== HEAP SUMMARY:
==30344==     in use at exit: 0 bytes in 0 blocks
==30344==   total heap usage: 66 allocs, 67 frees, 10,915 bytes allocated
==30344== 
==30344== All heap blocks were freed -- no leaks are possible
==30344== 
==30344== For counts of detected and suppressed errors, rerun with: -v
==30344== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

See https://lists.gnu.org/archive/html/bug-bash/2017-04/msg00042.html for an analysis by bash's maintainer. I confirmed this by adding printfs to show that the malloced address is the same one that valgrind later complains about.

It would be nice to get this fixed, as it causes problems for test suites of other programs (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=849517 )
Comment 1 Julian Seward 2017-04-13 09:59:13 UTC
Reuben!

Per Chet's last comment line

  There's no place where that value is part of a data segment.

I think that's a misreading of V's (admittedly ambiguous) claim about
the segment.  I think it means the brk segment.  And one way that could
happen is that the allocated block has for whatever reason been allocated
by the system (glibc?) malloc, which puts it in the brk segment, but has
been passed to the Valgrind-provided free function, hence causing the
reported error.  (Note -- unproven theory, at best).

So it might be some kind of malloc/free interception problem.  Have there
been any changes to bash lately that might account for that?
Comment 2 Reuben Thomas 2017-04-13 10:27:04 UTC
Julian!

Note that this problem goes back to at least bash 4.2, i.e. February 2011, as per the Debian bug report.

By default, bash seems to use its own malloc (and certainly in the case where I do a default "./configure && make" from git sources), so I guess it's easily possible.
Comment 3 Reuben Thomas 2017-04-13 10:29:32 UTC
And indeed configuring as

./configure --without-bash-malloc

works nicely.
Comment 4 Reuben Thomas 2017-04-13 10:33:52 UTC
(My reference to bash 4.2 in the Debian bug report is bogus, sorry, although that doesn't really matter now.)
Comment 5 Julian Seward 2017-04-13 11:38:37 UTC
(In reply to rrt from comment #2)
> By default, bash seems to use its own malloc (and certainly in the case
> where I do a default "./configure && make" from git sources), so I guess
> it's easily possible.

As a side note: in that case, what you report is symptomatic of bash
using its own malloc to allocate a block but the system free to release
it.  Could that be the case?
Comment 6 Reuben Thomas 2017-04-13 14:35:43 UTC
A bit of gdb does suggest that this is the case: i.e. bash's sh_malloc is called to allocate (and is not intercepted) but glibc's free is called to free.

I've updated the bash bug report thread, will confirm the outcome here.
Comment 7 Ivo Raisr 2017-04-13 16:54:43 UTC
This could be also confirmed by running Valgrind with --trace-malloc, for example:
   $ valgrind --trace-malloc=yes bash -c 'exit 0'

Free'd address should have appeared previously in one of malloc/calloc/realloc/...
Comment 8 Mark Wielaard 2017-04-13 17:26:52 UTC
I think the problem is that bash not only has its own malloc/free implementation (valgrind should intercept that just fine). But also has malloc wrappers for some malloc functions that call their internal counterparts directly (so valgrind won't know that is a matching allocation/deallocation function).

See xmalloc.h:

#if defined(USING_BASH_MALLOC) && !defined (DISABLE_MALLOC_WRAPPERS)
extern PTR_T sh_xmalloc __P((size_t, const char *, int));
extern PTR_T sh_xrealloc __P((void *, size_t, const char *, int));
extern void sh_xfree __P((void *, const char *, int));

#define xmalloc(x)      sh_xmalloc((x), __FILE__, __LINE__)
#define xrealloc(x, n)  sh_xrealloc((x), (n), __FILE__, __LINE__)
#define xfree(x)        sh_xfree((x), __FILE__, __LINE__)

#ifdef free
#undef free
#endif
#define free(x)         sh_xfree((x), __FILE__, __LINE__)
#endif  /* USING_BASH_MALLOC */
Comment 9 Mark Wielaard 2017-04-13 17:39:54 UTC
Yep, that seems to be it. If you explicitly add the following to config.h (I couldn't find the configure option that does that):

#define DISABLE_MALLOC_WRAPPERS 1

Then a bash build from source seems to work fine under valgrind (and finds a couple of small memory leaks).
Comment 10 Reuben Thomas 2017-04-13 20:02:51 UTC
Thanks very much Mark.

What's the solution? A suppression? (It would seem to go against Valgrind's philosophy to say "build bash without the (debugging) wrappers"!) But as highlighted in the Debian thread, it would be good to fix this.

(The other obvious this would be simply to build --without-bash-malloc; the bash maintainers say this would have performance implications.)

Or can valgrind be taught about the wrappers?
Comment 11 Mark Wielaard 2017-04-14 13:54:03 UTC
(In reply to rrt from comment #10)
> What's the solution? A suppression? (It would seem to go against Valgrind's
> philosophy to say "build bash without the (debugging) wrappers"!) But as
> highlighted in the Debian thread, it would be good to fix this.
> 
> (The other obvious this would be simply to build --without-bash-malloc; the
> bash maintainers say this would have performance implications.)
> 
> Or can valgrind be taught about the wrappers?

Some distros (Fedora in particular) already build with --without-bash-malloc. So it seems well tested. I would encourage the bash maintainer to contact the glibc hackers and show them some benchmarks where the generic system malloc performs worse than the bash one. They are really interested in making sure the system malloc performs well for everybody.

If you want to keep the bash specific malloc implementation, but want to restore the malloc interposition that breaks valgrind (and anything else that depends on malloc interposition working correctly) then building with -DDISABLE_MALLOC_WRAPPERS=1 seems like the easiest way forward. You might want to ask the bash maintainers whether they feel that is a supported option. It seems to only be used to some malloc debug/tracing messages.

There are ways to teach valgrind about allocators that don't obey malloc interposition, but they are slightly involved. See http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs

If you do want to make the sh_xmalloc wrappers work for valgrind with normal malloc interposition (like the normal xmalloc calls) then you could make them just call malloc/free directly when running under valgrind (it would obviously still be broken for other malloc interposers):

diff -ur bash-4.4/xmalloc.c bash-4.4.valgrind/xmalloc.c
--- bash-4.4/xmalloc.c	2016-08-26 15:08:14.000000000 +0200
+++ bash-4.4.valgrind/xmalloc.c	2017-04-14 15:49:26.832197332 +0200
@@ -35,6 +35,8 @@
 #  include "ansi_stdlib.h"
 #endif /* HAVE_STDLIB_H */
 
+#include <valgrind/valgrind.h>
+
 #include "error.h"
 
 #include "bashintl.h"
@@ -180,7 +182,10 @@
 #endif
 
   FINDBRK();
-  temp = sh_malloc (bytes, file, line);
+  if (RUNNING_ON_VALGRIND)
+    temp = malloc (bytes);
+  else
+    temp = sh_malloc (bytes, file, line);
 
   if (temp == 0)
     sh_allocerr ("xmalloc", bytes, file, line);
@@ -203,7 +208,10 @@
 #endif
 
   FINDBRK();
-  temp = pointer ? sh_realloc (pointer, bytes, file, line) : sh_malloc (bytes, file, line);
+  if (RUNNING_ON_VALGRIND)
+    temp = pointer ? realloc (pointer, bytes) : malloc (bytes);
+  else
+    temp = pointer ? sh_realloc (pointer, bytes, file, line) : sh_malloc (bytes, file, line);
 
   if (temp == 0)
     sh_allocerr ("xrealloc", bytes, file, line);
@@ -218,6 +226,11 @@
      int line;
 {
   if (string)
-    sh_free (string, file, line);
+    {
+      if (RUNNING_ON_VALGRIND)
+	free (string);
+      else
+	sh_free (string, file, line);
+    }
 }
 #endif
Comment 12 Reuben Thomas 2017-04-14 21:12:30 UTC
Thanks very much for the detailed guidance, Mark.
Comment 13 Reuben Thomas 2017-04-14 21:14:40 UTC
Closing. I think "WONTFIX" is strictly accurate, as Valgrind's diagnosis is indeed incorrect.