Summary: | mmap of huge pages fails due to incorrect alignment | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Tommy Janjusic <tommy.janjusic> |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aschorr, muppakrishna1, nate, philippe.waroquiers, tom |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=339163 | ||
Latest Commit: | Version Fixed In: | ||
Attachments: | adding another layer on mmap, ignoring the MAP_FIXED argument |
Description
Tommy Janjusic
2014-04-03 22:18:19 UTC
Created attachment 85950 [details]
adding another layer on mmap, ignoring the MAP_FIXED argument
there could be a better way of handling this by giving another hint to space_manager that we are looking for a particular alignment. for huge pages and MAP_FIXED 4k alignment is not sufficient.
eg.
if(size>8MB) {give me 8MB aligned address}
elseif(size>4MB{give me 4MB aligned address}
elseif(size>2MB{give me 2MB aligned address}
elseif(size>1MB{give me 1MB aligned address}
else just give me 4k aligned….
*** Bug 339163 has been marked as a duplicate of this bug. *** I applied Tommy's patch to 3.10.0 and it allowed the mmap() of both 2MB and 1GB hugepages to work correctly. Post-patch results are below. See https://bugs.kde.org/show_bug.cgi?id=339163 for the pre-patch results and the testcase used. nate@haswell:~$ valgrind valgrind_hugepage_mmap_bug 4KB ==2910== Memcheck, a memory error detector ==2910== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==2910== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==2910== Command: valgrind_hugepage_mmap_bug 4KB ==2910== Opening /dev/zero Trying mmap() for 4KB pages Writing 100000 bytes. Success ==2910== ==2910== HEAP SUMMARY: ==2910== in use at exit: 0 bytes in 0 blocks ==2910== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==2910== ==2910== All heap blocks were freed -- no leaks are possible ==2910== ==2910== For counts of detected and suppressed errors, rerun with: -v ==2910== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) nate@haswell:~$ valgrind valgrind_hugepage_mmap_bug 2MB ==2911== Memcheck, a memory error detector ==2911== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==2911== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==2911== Command: valgrind_hugepage_mmap_bug 2MB ==2911== Opening /var/lib/hugetlbfs/global/pagesize-2097152/page Trying mmap() for 2MB pages Writing 100000 bytes. Success ==2911== ==2911== HEAP SUMMARY: ==2911== in use at exit: 0 bytes in 0 blocks ==2911== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==2911== ==2911== All heap blocks were freed -- no leaks are possible ==2911== ==2911== For counts of detected and suppressed errors, rerun with: -v ==2911== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) nate@haswell:~$ valgrind valgrind_hugepage_mmap_bug 1GB ==2913== Memcheck, a memory error detector ==2913== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==2913== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==2913== Command: valgrind_hugepage_mmap_bug 1GB ==2913== Opening /var/lib/hugetlbfs/global/pagesize-1073741824/page Trying mmap() for 1GB pages Writing 100000 bytes. Success ==2913== ==2913== HEAP SUMMARY: ==2913== in use at exit: 0 bytes in 0 blocks ==2913== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==2913== ==2913== All heap blocks were freed -- no leaks are possible ==2913== ==2913== For counts of detected and suppressed errors, rerun with: -v ==2913== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) nate@haswell:~$ strace valgrind valgrind_hugepage_mmap_bug 4KB ... mmap(0x4026000, 100000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x4026000 nate@haswell:~$ strace valgrind valgrind_hugepage_mmap_bug 2MB ... mmap(0x4026000, 100000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument) mmap(0x4026000, 100000, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x2aaaaac00000 nate@haswell:~$ strace valgrind valgrind_hugepage_mmap_bug 1GB ... mmap(0x4026000, 100000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument) mmap(0x4026000, 100000, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x40000000 Well of course it works. it's just completely wrong because it removes valgrind's control over where things are mapped - the MAP_FIXED is added on purpose, to ensure that things are mapped where valgrind wants them. The correct fix is for aspacemgr to be told to select an address with the correct alignment, but that requires us being able to tell that we are dealing with huge pages. (In reply to Tom Hughes from comment #4) > Well of course it works. it's just completely wrong because it removes > valgrind's control over where things are mapped - the MAP_FIXED is added on > purpose, to ensure that things are mapped where valgrind wants them. > > The correct fix is for aspacemgr to be told to select an address with the > correct alignment, but that requires us being able to tell that we are > dealing with huge pages. I've noted this in my original comment I thought... >adding another layer on mmap, ignoring the MAP_FIXED argument >there could be a better way of handling this by giving another hint to space_manager that we are >looking for a particular alignment. for huge pages and MAP_FIXED 4k alignment is not sufficient. I was told people are reluctant to fiddle with aspacemgr :), so this worked for me, thought maybe others may find it useful as well. Speaking of, are there any immediate consequences that you can foresee of taking away control over huge_pages from Valgrind? *Well of course it works.* From a user's point of view, it's not clear that this is always a negative. If the issue is that Valgrind is unable to correctly monitor the page, perhaps there could be a warning to this effect? If the issue is that performance is worse, I'd suggest this is better than not being able to run at all. Using MAP_HUGETLB if available to decide the correct alignment would be fine, but it would be preferable to also support allocations from hugetlbfs, since on older Linux kernels this is the only way to get access to 1GB pages. If relying on MAP_HUGETLB, this might be a useful testcase: http://lxr.free-electrons.com/source/tools/testing/selftests/vm/thuge-gen.c, as it tests both 2MB and 1GB hugepages. On Thu, Sep 18, 2014 at 10:48 AM, Tom Hughes <tom@compton.nu> wrote: > https://bugs.kde.org/show_bug.cgi?id=333051 > > --- Comment #4 from Tom Hughes <tom@compton.nu> --- > Well of course it works. it's just completely wrong because it removes > valgrind's control over where things are mapped - the MAP_FIXED is added on > purpose, to ensure that things are mapped where valgrind wants them. > > The correct fix is for aspacemgr to be told to select an address with the > correct alignment, but that requires us being able to tell that we are dealing > with huge pages. > > -- > You are receiving this mail because: > You are on the CC list for the bug. (In reply to Tom Hughes from comment #4) > Well of course it works. it's just completely wrong because it removes > valgrind's control over where things are mapped - the MAP_FIXED is added on > purpose, to ensure that things are mapped where valgrind wants them. I do not think it is completely wrong, even if not ideal. As I understand, the aspacemgr tries to control where things are mapped mostly for efficiency reasons on 64 bits platforms, due to the memcheck address maps, which becomes a lot less efficient once some 'high addresses' are mapped. With the patch, if the kernel did a mapping at a certain address, then normally, the aspacemgr data structures should indicate that this zone is available for the client. So, to as a last resort let the kernel decide of the mapping is somewhat similar to have a good advice about where to do the mapping. If then it is checked that the mapping is in fact acceptable (ie. is effectively at a free place, usable by the client), then that will solve the problem an easy way. Note that a similar trick (retry without the MMAP_FIXED) is already used in aspacemgr for darwin and for the outer/inner setup. Note that the aspacemgr trick is really a kludge, as this in fact bypasses an incorrect knowledge of the mapping by the inner valgrind (due to the outer valgrind having done a mmap not observed by the inner valgrind). In the case of syswrap-generic.c, I however think that the aspacemgr state will be fully correct after a successful (and verified as ok for the client) mapping done by the kernel. What the attached patch misses is a verification that the resulting mmap is effectively ok. See also the discussion in http://sourceforge.net/p/valgrind/mailman/message/32167895/ and following posts. > > The correct fix is for aspacemgr to be told to select an address with the > correct alignment, but that requires us being able to tell that we are > dealing with huge pages. That would effectively be better, but implies to learn to valgrind the concept of hugepages and/or huge page file systems and/or ... (whatever the way this is working). The trick to "ask the kernel an advice and do the mmap at the same time" seems a lot simpler/portable, and looks good enough (assuming I have not missed a problem with the approach of retrying without MMAP_FIXED) Fixed in revision 14682 |