Created attachment 62827 [details] patch fixing mempolicy syscall checks Version: 3.7 SVN (using Devel) OS: Linux Hello, While debugging hwloc, I got the following valgrind report: ==7528== Syscall param mbind(nodemask) points to unaddressable byte(s) ==7528== at 0x6623E3A: ??? (in /usr/lib/libnuma.so.1) ==7528== by 0x6623EA6: mbind (in /usr/lib/libnuma.so.1) ==7528== by 0x5048A77: hwloc_linux_set_area_membind (in /home/bgoglin/SOFT/hwloc/trunk/build/src/.libs/libhwloc.so.0.0.0) ==7528== by 0x50402C5: hwloc_set_area_membind_nodeset (in /home/bgoglin/SOFT/hwloc/trunk/build/src/.libs/libhwloc.so.0.0.0) ==7528== by 0x4059ED: main (in /home/bgoglin/SOFT/hwloc/trunk/build/utils/.libs/lt-lstopo) A careful review of the mbind manpage and hwloc code confirms that hwloc is right. Valgrind is however doing an invalid check on nodemask. And there are actually 3 problems in the mempolicy section of coregrind/m_syswrap/syswrap-linux.c: 1) Compare the checks in PRE(sys_mbind): PRE_MEM_READ( "mbind(nodemask)", ARG4, VG_ROUNDUP( ARG5, sizeof(UWord) ) / sizeof(UWord) ); PRE(sys_set_mempolicy): PRE_MEM_READ( "set_mempolicy(nodemask)", ARG2, VG_ROUNDUP( ARG3, sizeof(UWord) ) / sizeof(UWord) ); with PRE(sys_get_mempolicy): PRE_MEM_WRITE( "get_mempolicy(nodemask)", ARG2, VG_ROUNDUP( ARG3, sizeof(UWord) * 8 ) / sizeof(UWord) ); The VG_ROUNDUP is wrong in mbind/set_mempolicy while it's OK in get_mempolicy. maxnode is the max index of a bit in nodemask, it's not the allocated nodemask size. So if you want to round up to the nearest multiple of bits per long, you want VG_ROUNDUP(..., sizeof(UWord)*8) instead of VG_ROUNDUP(..., sizeof(UWord)). This affects PRE(sys_bind) and PRE(sys_set_mempolicy) 2) Dividing the result of VG_ROUNDUP by sizeof(UWord) is wrong too. You want to convert bits into bytes there. So you want to divide by 8 (or BITS_PER_BYTE if it exists). This works on 64bits by chance but it should break on 32bits (I don't have any machine to test this). This affects all PRE/POST in the mempolicy section. 3) The tricky one. maxnode should be X when nodemask may contain up to X bits (from 0 to X-1). That's what the manpages say (even if it's not very clear). However, the actual implementation in the kernel is different. It wants maxnode to be increased by 1 again. If you look at mm/mempolicy.c in the kernel (from 2.6.12 to 3.1rc1), you'll see that the maxnode/nodemask given to mbind and set_mempolicy are checked in get_nodes(). This function starts by doing maxnode-- and then does nothing if maxnode is 0. This means that if you want to set the first bit in nodemask, you need maxnode to be 2 or more. So the aforementioned checks should be updated again by decreasing maxnode by 1 in VG_ROUNDUP. I am attaching a patch including all these fixes. Should apply to 3.6.1 (Debian testing) and 3.6.99svn11761 (Debian experimental). Reproducible: Didn't try Steps to Reproduce: see above Expected Results: see above
Committed (with the extra -1 added to get_mempolicy) as r11977.