Summary: | Valgrind Abort with "failed in UME with error 22" | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Doug McGrath <doug> |
Component: | memcheck | Assignee: | Julian Seward <jseward> |
Status: | ASSIGNED --- | ||
Severity: | major | CC: | atwilson, chris.m.gibson, dank, ericsiums, flo2030, gregczajkowski, konstantin.s.serebryany, matt, mdunphy, mooreb, njn, pete.flugstad, philippe.waroquiers, timurrrr, tom |
Priority: | NOR | ||
Version: | 3.2.1 | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=352384 | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Doug McGrath
2006-12-06 02:24:23 UTC
*** Bug 138856 has been marked as a duplicate of this bug. *** Can't easily fix this, but I did commit a change to make it print a more intelligible error message. See bug 138856 for background. I'm also seeing this in version 3.2.2 for an application writen in fortran and compiled in 64 bit on x86_64. This is a problem because large static arrays are a fact of life with f77. I'm closing crashing and similar bugs that are more than two years old. If you still see this problem with Valgrind 3.4.1 please reopen the bug report. Thanks. Nicholas, I can confirm that with valgrind 3.5.0, this issue still exists. I can verify that this happens on Ubuntu 9.10 on amd64 with current SVN: $ svn update $ $ svn info Path: . URL: svn://svn.valgrind.org/valgrind/trunk Repository Root: svn://svn.valgrind.org/valgrind Repository UUID: a5019735-40e9-0310-863c-91ae7b9d1cf9 Revision: 11027 Node Kind: directory Schedule: normal Last Changed Author: bart Last Changed Rev: 11027 Last Changed Date: 2010-01-17 03:02:23 -0800 (Sun, 17 Jan 2010) $ make distclean [...] $ autoreconf $ ./configure --prefix=/home/matt [...] $ make install [...] $ ~/bin/valgrind /usr/bin/ls valgrind: mmap(0x400000, 110592) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. I can't get it to work with any binary. I have tried --enable-only64bit and --enable-inner to see if that would workaround the issue, and it didn't. Let me know if you need my config.log or anything to enable further debugging. We are seeing a similar problem running our Chromium tests under valgrind (more info: http://code.google.com/p/chromium/issues/detail?id=28439). Worker tests run under valgrind sporadically get an mmap error 22, which is odd because it looks like none of our static resources are larger than 0.5MB. BTW, I'd note that this bug is marked as NEEDSINFO/INVALID - I think we've provided enough information to reopen the bug, but my bugzilla-fu is clearly not up to the task of changing the status to REOPENED (perhaps I don't have permission for this?) (In reply to comment #6) > I can verify that this happens on Ubuntu 9.10 on amd64 with current SVN: I don't disbelieve your report, but OTOH (1) we would never have shipped 3.5.0 if it was so obviously broken, and (2) the current SVN works fine for me on Ubuntu 9.10 amd64: $ cat /etc/issue Ubuntu 9.10 \n \l $ uname -a Linux nienna 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 02:39:34 UTC 2010 x86_64 GNU/Linux $ ./vg-in-place -q date Mon Feb 15 11:30:42 CET 2010 So I am inclined to believe that the failure you are seeing is the result of some local configuration difference, and which is not present in a vanilla 9.10 install. Problem is I'm not sure what we're looking for here. Do you have some unusual ulimit setting, or some enhanced security settings? Any other differences from a vanilla install? Just to clarify: there are really two different problems here. This failure will occur for executables with huge text, data or bss segments; but "huge" means, like, 500MB kind of size. That's clearly what happened for the original report (comment #0) and for comment #3. This is a known and understood problem, which isn't easy to fix; but at least we know what the problem is. The only known workaround is to try again with a 64-bit process rather than a 32-bit one. Now, it's clear that there is some second set of failures which do not involve huge segments, as per comment #6, comment #7, comment #8. Obviously I would like to fix this, but I do not know what the problem is, and you (collectively) will need to provide more info or ideally a simple case that reproduces this on a vanilla Ubuntu install. Comment #5 is ambiguous; I can't see whether that falls into the "huge segment" category or the "mysterious other failure" category. Reopening. The problem in #6 might be bug 193413. Matt, what does ld --version say? If it mentions 'gold', you need to rebuilt valgrind with plain old ld, not with gold (at least until bug 193413 is fixed). (In reply to comment #12) Ah yes, excellent point. So the test is to do readelf -a memcheck/memcheck-amd64-linux | grep "Entry point address" and we need to get a number of the form 0x38000000 + a little bit, for example 0x38030980. If so then there should be no problem. If the result is very far different (eg 0x402430 ish or 0x8048420) then it has been linked with gold and will fail for all programs, even the smallest. Yup, I'm using gold as my system linker as I'm testing GCC 4.5's LTO feature quite a bit recently. I noticed that in Dan's binutils bug that they say exact how valgrind needs to be updated -- are there any plans to do that soon? use of gold is likely to go up and people start using GCC 4.5. Sorry for the confusion! Just to clarify, the Chromium Issue #28439 mentioned above happens even though Valgrind is NOT linked with gold there. I can confirm that this bug exists on Ubuntu 11.10 (X86_64) with valgrind 3.6.1: $ valgrind --version valgrind-3.6.1-Debian $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=11.10 DISTRIB_CODENAME=oneiric DISTRIB_DESCRIPTION="Ubuntu 11.10" $ uname -a Linux atlas 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux $ ld --version GNU ld (GNU Binutils for Ubuntu) 2.21.53.20110810 $ file upsample upsample: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped $ valgrind upsample ... valgrind: mmap(0x654000, 2836611072) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. My program, upsample, statically allocates some very large arrays (2+ GB), such that I have to build with -mcmodel=medium to get it to link. Is there any workaround for this, such as malloc'ing those arrays instead of statically allocating them (I'm about to go try that, but are there any other suggestions?). Thanks. Hi, This bug lives on: $ valgrind --version valgrind-3.7.0 $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS" $ ld --version GNU ld (GNU Binutils for Ubuntu) 2.22 $ file mybinary mybinary: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0xc13071aeef1a606096cc039179da0da7f6113ef6, not stripped $ valgrind ./mybinary valgrind: mmap(0x62c000, 3010629632) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. Cloned r15194 yesterday (valgrind-3.11.0.SVN) and are running into this bug. valgrind: mmap(0x400000, 942673920) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. Is there anything I can do to our compile of the binary to prevent this? I just read this comment in coregrind/m_aspacemgr/aspacemgr-linux.c The available space is delimited by aspacem_minAddr and aspacem_maxAddr. aspacem is flexible and can operate with these at any (sane) setting. ... 64-bit Linux is similar except for the important detail that the upper boundary is set to 64G. The reason is so that all anonymous mappings (basically all client data areas) are kept below 64G, since that is the maximum range that memcheck can track shadow memory using a fast 2-level sparse array. It can go beyond that but runs much more slowly. The 64G limit is arbitrary and is trivially changed. That makes me think that increasing the value of aspacem_maxAddr might get you past that mmap error. Might be worth a try. changed all these numbers in configure.ac - valt_load_address_pri_norml="0x38000000" + valt_load_address_pri_norml="0x68000000" It seemed to do the trick, but not positive whether this will render valgrind unusable? (In reply to Gregory Czajkowski from comment #20) > changed all these numbers in configure.ac > > - valt_load_address_pri_norml="0x38000000" > > + valt_load_address_pri_norml="0x68000000" > > It seemed to do the trick, but not positive whether this will render > valgrind unusable? Should be ok. As far as I can see, the problem is because you are doing a fixed mmap: You ask for a segment starting at 0x400000, of size 942673920. This means that the end of your segment is at 0X400000 + 942673920 which is 0x38701000, while valgrind loads itself by default at 0x38000000 So, your fixed mapping overlaps with some already mapped area (namely the place where valgrind loads itself). Your change asks Valgrind to be loaded at 0x68000000, which means that you do not have a conflict anymore. The best is to not ask for a fixed mapping, and let Valgrind decide of the place where such a big segment has to be loaded. You have more chances to have that working. If you really need to map this segment at that address, then the change you have done is the good one. Note that it would be nice/better that Valgrind aspacemgr would give better error message when it cannot do an mmap. There is currently very little info going out of the aspacemgr when it has a failed mmap, except EINVAL :) We just recently solved this problem in a similar manner to get around large segment sizes in our executables in a similar way: valt_load_address_pri_norml="0x38000000" -> "0x68000000" In our case though we really do have very large text/data segments and are not explicitly mapping using fixed addresses. Rather valgrind is doing the calls using fixed segments to allocate space for the executable segments. $ size -x vcs_sim_exe text data bss dec hex filename 0x38773ebd 0x1d0a5120 0x1ab050 1436303405 559c402d vcs_sim_exe So this seems like a valgrind issue to me as I'm not sure why it insists on using fixed addresses for these segments, and why it fails if it can't get them. I wrote a little test problem and enabled debug in valgrind to show some extra debug info around the mapping code: In this instance I create an BSS segment (832m). This happens to fit from the fixed address valgrind wants to place it at (602000+). You'll see it approaches valgrind loader address of 0x38000000. If I make the segment slightly larger it will fail with mmap error. --27118:0:aspacem 0: RSVN 0000000000-00003fffff 4194304 ----- SmFixed --27118:0:aspacem 1: file 0000400000-0000400fff 4096 r-x-- d=0x01f i=1483299 o=0 (1) --27118:0:aspacem 2: RSVN 0000401000-0000600fff 2097152 ----- SmFixed --27118:0:aspacem 3: file 0000601000-0000601fff 4096 rw--- d=0x01f i=1483299 o=4096 (1) --27118:0:aspacem 4: RSVN 0000602000-0003ffffff 57m ----- SmFixed --27118:0:aspacem 5: 0004000000-0037ffffff 832m --27118:0:aspacem 6: FILE 0038000000-00383d5fff 4022272 r-x-- d=0x01c i=33433233 o=0 (0) --27118:0:aspacem 7: 00383d6000-00385d5fff 2097152 --27118:0:aspacem 8: FILE 00385d6000-00385d8fff 12288 rwx-- d=0x01c i=33433233 o=4022272 (0) So, I'm not sure why valgrind insists these segment maps be located at 602000 and why it's a hard failure if the segment exceeds around 840MB or so. Shouldn't it retry the map with a non-fixed address? It's possible though that even with this patch our executable segment size might bump up against the loader address again, so I'm hoping we can figure out a robust fix for this, or at least have some configure options available to avoid it more easily (and maybe something in the FAQ?). I did try to move the address higher up "0x38000000" -> "0xA8000000" to give us more headroom but ran into linker relocation errors. I noticed that Darwin uses 0x138000000, we'd be okay with a 64-bit only build as well I think to avoid issues like this but I hit relocation errors when trying that as well on 'make'. |