Bug 138424 - Valgrind Abort with "failed in UME with error 22"
Summary: Valgrind Abort with "failed in UME with error 22"
Status: ASSIGNED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.2.1
Platform: Compiled Sources Linux
: NOR major
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 138856 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-12-06 02:24 UTC by Doug McGrath
Modified: 2015-09-12 06:29 UTC (History)
15 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Doug McGrath 2006-12-06 02:24:23 UTC
I'm using valgrind to debug an application with mixed C and Fortran code (yes, 
I know). This application maps very large shared memory segments, and the size 
varies according to our customer settings. I've used valgrind successfully on 
this application, but for one customer's system, it aborts immediately after 
startup with the message "failed in UME with error 22". This particular system 
is sized quite large for this application, so I suspect that the problem is 
related to that.
Comment 1 Julian Seward 2006-12-25 09:26:27 UTC
*** Bug 138856 has been marked as a duplicate of this bug. ***
Comment 2 Julian Seward 2006-12-27 06:24:27 UTC
Can't easily fix this, but I did commit a change to make it print 
a more intelligible error message.  See bug 138856 for background.
Comment 3 Gerard Gorman 2007-01-25 15:03:51 UTC
I'm also seeing this in version 3.2.2 for an application writen in fortran and compiled in 64 bit on x86_64. This is a problem because large static arrays are a fact of life with f77.
Comment 4 Nicholas Nethercote 2009-06-30 06:35:00 UTC
I'm closing crashing and similar bugs that are more than two years old.  If 
you still see this problem with Valgrind 3.4.1 please reopen the bug report.
Thanks.
Comment 5 Branden Moore 2009-12-01 17:29:00 UTC
Nicholas,

  I can confirm that with valgrind 3.5.0, this issue still exists.
Comment 6 Matt Hargett 2010-01-19 21:39:06 UTC
I can verify that this happens on Ubuntu 9.10 on amd64 with current SVN:

$ svn update
$ $ svn info
Path: .
URL: svn://svn.valgrind.org/valgrind/trunk
Repository Root: svn://svn.valgrind.org/valgrind
Repository UUID: a5019735-40e9-0310-863c-91ae7b9d1cf9
Revision: 11027
Node Kind: directory
Schedule: normal
Last Changed Author: bart
Last Changed Rev: 11027
Last Changed Date: 2010-01-17 03:02:23 -0800 (Sun, 17 Jan 2010)
$ make distclean
[...]
$ autoreconf
$ ./configure --prefix=/home/matt
[...]
$ make install
[...]
$ ~/bin/valgrind /usr/bin/ls
valgrind: mmap(0x400000, 110592) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

I can't get it to work with any binary. I have tried --enable-only64bit and --enable-inner to see if that would workaround the issue, and it didn't.

Let me know if you need my config.log or anything to enable further debugging.
Comment 7 Drew Wilson 2010-02-10 20:00:15 UTC
We are seeing a similar problem running our Chromium tests under valgrind (more info: http://code.google.com/p/chromium/issues/detail?id=28439). 

Worker tests run under valgrind sporadically get an mmap error 22, which is odd because it looks like none of our static resources are larger than 0.5MB.
Comment 8 Drew Wilson 2010-02-10 20:07:10 UTC
BTW, I'd note that this bug is marked as NEEDSINFO/INVALID - I think we've provided enough information to reopen the bug, but my bugzilla-fu is clearly not up to the task of changing the status to REOPENED (perhaps I don't have permission for this?)
Comment 9 Julian Seward 2010-02-15 11:34:47 UTC
(In reply to comment #6)
> I can verify that this happens on Ubuntu 9.10 on amd64 with current SVN:

I don't disbelieve your report, but OTOH (1) we would never have
shipped 3.5.0 if it was so obviously broken, and (2) the current SVN
works fine for me on Ubuntu 9.10 amd64:

  $ cat /etc/issue
  Ubuntu 9.10 \n \l

  $ uname -a
  Linux nienna 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 02:39:34 UTC 2010 x86_64 GNU/Linux

  $ ./vg-in-place -q date
  Mon Feb 15 11:30:42 CET 2010

So I am inclined to believe that the failure you are seeing is the
result of some local configuration difference, and which is not
present in a vanilla 9.10 install.  Problem is I'm not sure what we're
looking for here.  Do you have some unusual ulimit setting, or some
enhanced security settings?  Any other differences from a vanilla
install?
Comment 10 Julian Seward 2010-02-15 11:44:05 UTC
Just to clarify: there are really two different problems here.

This failure will occur for executables with huge
text, data or bss segments; but "huge" means, like, 500MB kind of
size.  That's clearly what happened for the original report
(comment #0) and for comment #3.  This is a known and understood
problem, which isn't easy to fix; but at least we know what the problem
is.  The only known workaround is to try again with a 64-bit process
rather than a 32-bit one.

Now, it's clear that there is some second set of failures which do
not involve huge segments, as per comment #6, comment #7, comment #8.
Obviously I would like to fix this, but I do not know what the problem
is, and you (collectively) will need to provide more info or ideally
a simple case that reproduces this on a vanilla Ubuntu install.

Comment #5 is ambiguous; I can't see whether that falls into the
"huge segment" category or the "mysterious other failure" category.
Comment 11 Julian Seward 2010-02-15 11:47:09 UTC
Reopening.
Comment 12 Dan Kegel 2010-02-16 19:47:09 UTC
The problem in #6 might be bug 193413.  Matt, what does ld --version say?
If it mentions 'gold', you need to rebuilt valgrind with plain old ld,
not with gold (at least until bug 193413 is fixed).
Comment 13 Julian Seward 2010-02-16 20:14:30 UTC
(In reply to comment #12)
Ah yes, excellent point.  So the test is to do

readelf -a memcheck/memcheck-amd64-linux | grep "Entry point address"

and we need to get a number of the form 0x38000000 + a little bit,
for example 0x38030980.  If so then there should be no problem.
If the result is very far different (eg 0x402430 ish or 0x8048420)
then it has been linked with gold and will fail for all programs,
even the smallest.
Comment 14 Matt Hargett 2010-02-26 19:12:43 UTC
Yup, I'm using gold as my system linker as I'm testing GCC 4.5's LTO feature quite a bit recently. I noticed that in Dan's binutils bug that they say exact how valgrind needs to be updated -- are there any plans to do that soon? use of gold is likely to go up and people start using GCC 4.5.

Sorry for the confusion!
Comment 15 Timur Iskhodzhanov 2010-02-26 19:21:26 UTC
Just to clarify, the Chromium Issue #28439 mentioned above happens even though Valgrind is NOT linked with gold there.
Comment 16 Pete Flugstad 2011-11-21 21:16:51 UTC
I can confirm that this bug exists on Ubuntu 11.10 (X86_64) with valgrind 3.6.1:

  $ valgrind --version
  valgrind-3.6.1-Debian

  $ cat /etc/lsb-release 
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=11.10
  DISTRIB_CODENAME=oneiric
  DISTRIB_DESCRIPTION="Ubuntu 11.10"

  $ uname -a
  Linux atlas 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

  $ ld --version
  GNU ld (GNU Binutils for Ubuntu) 2.21.53.20110810

  $ file upsample
  upsample: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

  $ valgrind upsample ...
  valgrind: mmap(0x654000, 2836611072) failed in UME with error 22 (Invalid argument).
   valgrind: this can be caused by executables with very large text, data or bss segments.

My program, upsample, statically allocates some very large arrays (2+ GB), such that I have to build with -mcmodel=medium to get it to link.

Is there any workaround for this, such as malloc'ing those arrays instead of statically allocating them (I'm about to go try that, but are there any other suggestions?). 

Thanks.
Comment 17 Michael 2013-08-21 17:55:06 UTC
Hi,

This bug lives on:

$ valgrind --version
valgrind-3.7.0

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.22

$ file mybinary
mybinary: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0xc13071aeef1a606096cc039179da0da7f6113ef6, not stripped

$ valgrind ./mybinary
valgrind: mmap(0x62c000, 3010629632) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
Comment 18 Gregory Czajkowski 2015-05-09 18:05:41 UTC
Cloned r15194 yesterday (valgrind-3.11.0.SVN) and are running into this bug.

valgrind: mmap(0x400000, 942673920) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

Is there anything I can do to our compile of the binary to prevent this?
Comment 19 Florian Krohm 2015-05-26 09:35:16 UTC
I just read this comment in coregrind/m_aspacemgr/aspacemgr-linux.c

     The available space is delimited by aspacem_minAddr and
     aspacem_maxAddr.  aspacem is flexible and can operate with these
     at any (sane) setting.  ...

     64-bit Linux is similar except for the important detail that the
     upper boundary is set to 64G.  The reason is so that all
     anonymous mappings (basically all client data areas) are kept
     below 64G, since that is the maximum range that memcheck can
     track shadow memory using a fast 2-level sparse array.  It can go
     beyond that but runs much more slowly.  The 64G limit is
     arbitrary and is trivially changed. 

That makes me think that increasing the value of aspacem_maxAddr might get you past that mmap error. Might be worth a try.
Comment 20 Gregory Czajkowski 2015-05-26 18:34:35 UTC
changed all these numbers in configure.ac

-        valt_load_address_pri_norml="0x38000000"                                    
+        valt_load_address_pri_norml="0x68000000"           

It seemed to do the trick, but not positive whether this will render valgrind unusable?
Comment 21 Philippe Waroquiers 2015-05-26 20:48:10 UTC
(In reply to Gregory Czajkowski from comment #20)
> changed all these numbers in configure.ac
> 
> -        valt_load_address_pri_norml="0x38000000"                           
> 
> +        valt_load_address_pri_norml="0x68000000"           
> 
> It seemed to do the trick, but not positive whether this will render
> valgrind unusable?

Should be ok.
As far as I can see, the problem is because you are doing a fixed mmap:
You ask for a segment starting at 0x400000, of size 942673920.
This means that the end of your segment is at 0X400000 + 942673920
which is 0x38701000, while valgrind loads itself by default at 0x38000000
So, your fixed mapping overlaps with some already mapped area (namely the place
where valgrind loads itself).
Your change asks Valgrind to be loaded at 0x68000000, which means that you do not
have a conflict anymore.

The best is to not ask for a fixed mapping, and let Valgrind decide of the place where
such a big segment has to be loaded. You have more chances to have that working.
If you really need to map this segment at that address, then the change you have done
is the good one.

Note that it would be nice/better that Valgrind aspacemgr would give better error
message when it cannot do an mmap. There is currently very little info going out of
the aspacemgr when it has a failed mmap, except EINVAL :)
Comment 22 Eric White 2015-05-27 16:21:01 UTC
We just recently solved this problem in a similar manner to get around large segment sizes in our executables in a similar way:

valt_load_address_pri_norml="0x38000000" -> "0x68000000"

In our case though we really do have very large text/data segments and are not explicitly mapping using fixed addresses.  Rather valgrind is doing the calls using fixed segments to allocate space for the executable segments.

$ size -x vcs_sim_exe
text                     data                    bss                    dec                     hex                   filename
0x38773ebd      0x1d0a5120      0x1ab050        1436303405      559c402d        vcs_sim_exe

So this seems like a valgrind issue to me as I'm not sure why it insists on using fixed addresses for these segments, and why it fails if it can't get them.  I wrote a little test problem and enabled debug in valgrind to show some extra debug info around the mapping code:

In this instance I create an BSS segment (832m).  This happens to fit from the fixed address valgrind wants to place it at (602000+).  You'll see it approaches valgrind loader address of 0x38000000.  If I make the segment slightly larger it will fail with mmap error.

--27118:0:aspacem    0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed
--27118:0:aspacem    1: file 0000400000-0000400fff    4096 r-x-- d=0x01f i=1483299 o=0       (1)
--27118:0:aspacem    2: RSVN 0000401000-0000600fff 2097152 ----- SmFixed
--27118:0:aspacem    3: file 0000601000-0000601fff    4096 rw--- d=0x01f i=1483299 o=4096    (1)
--27118:0:aspacem    4: RSVN 0000602000-0003ffffff     57m ----- SmFixed
--27118:0:aspacem    5:      0004000000-0037ffffff    832m
--27118:0:aspacem    6: FILE 0038000000-00383d5fff 4022272 r-x-- d=0x01c i=33433233 o=0       (0)
--27118:0:aspacem    7:      00383d6000-00385d5fff 2097152
--27118:0:aspacem    8: FILE 00385d6000-00385d8fff   12288 rwx-- d=0x01c i=33433233 o=4022272 (0)

So, I'm not sure why valgrind insists these segment maps be located at 602000 and why it's a hard failure if the segment exceeds around 840MB or so.  Shouldn't it retry the map with a non-fixed address?

It's possible though that even with this patch our executable segment size might bump up against the loader address again, so I'm hoping we can figure out a robust fix for this, or at least have some configure options available to avoid it more easily (and maybe something in the FAQ?).

I did try to move the address higher up "0x38000000" -> "0xA8000000" to give us more headroom but ran into linker relocation errors.  I noticed that Darwin uses 0x138000000, we'd be okay with a 64-bit only build as well I think to avoid issues like this but I hit relocation errors when trying that as well on 'make'.