Bug 287862 - MPI_IN_PLACE not supported for MPI collectives
Summary: MPI_IN_PLACE not supported for MPI collectives
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.7.0
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-29 20:20 UTC by Jeremiah Willcock
Modified: 2025-11-18 14:12 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
patch against valgrind-3.11.0/mpi/libmpiwrap.c (55.74 KB, patch)
2015-10-21 10:08 UTC, Emmanuel Thomé
Details
test case for MPI wrappers (28.18 KB, text/x-csrc)
2015-10-21 10:09 UTC, Emmanuel Thomé
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremiah Willcock 2011-11-29 20:20:51 UTC
Version:           3.7.0 (using KDE 1.2) 
OS:                Linux

The MPI_IN_PLACE symbol is not allowed in place of a send buffer for the MPI collectives in libmpiwrap.so.  For example, the following call:

    int x = 1;
    MPI_Allreduce(MPI_IN_PLACE, &x, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

produces the error:

==672== Unaddressable byte(s) found during client check request
==672==    at 0x4C153FD: check_mem_is_defined_untyped (libmpiwrap.c:950)
==672==    by 0x4C15E98: walk_type (libmpiwrap.c:688)
==672==    by 0x4C1CD8C: PMPI_Allreduce (libmpiwrap.c:921)
==672==    by 0x468998: main (bfs.cpp:346)
==672==  Address 0x1 is not stack'd, malloc'd or (recently) free'd

MPI treats the pointer value 1 as MPI_IN_PLACE, and does not try to dereference it.

Reproducible: Always

Steps to Reproduce:
See details.

Actual Results:  
See details.

Expected Results:  
Treating the receive buffer in the call as both the send and receive buffers for the collective operation.

I am using gcc 4.6.0 and Open MPI 1.5 on x64.
Comment 1 Julian Seward 2012-01-25 14:52:56 UTC
Hmm, it might be fastest if you fixed this, since it's not hard
to do and you obviously have a test rig close to hand.

In function WRAPPER_FOR(PMPI_Allreduce) I would *guess* you need
to change

   check_mem_is_defined(sendbuf, count, datatype);

to

   if (sendbuf != MPI_IN_PLACE)
      check_mem_is_defined(sendbuf, count, datatype);

(I am saying this on the basis of zero knowledge of the meaning
of MPI_IN_PLACE, so ymmv ..)

On concern is, does MPI_IN_PLACE exist for older MPIs (v1.1) ?
If not it will be necessary to conditionalise it somehow.
Comment 2 Emmanuel Thomé 2015-10-15 09:29:57 UTC
Hi,

Digging an old bug (which is still present and just hit me)

I can fix this and provide test cases, hopefully for all collectives in chapter 5 of the mpi-3.[01] document.

One question I have, though, is how it would be possible to proceed with the non-blocking collectives introduced in MPI-3.0.

The validity checks for the input can of course be made before posting the non-blocking collective call (e.g. MPI_Ibcast). However, the qualification of the output values would ideally have to wait for the MPI_Wait(|all|any) call to complete. Which would then open a can of worms. I don't see an easy way to do that in a binary compatible way.

Ideas ?

E.
Comment 3 Emmanuel Thomé 2015-10-20 21:09:28 UTC
I'm on it right now. I've got all collectives with and without MPI_IN_PLACE covered.

I see the shadow_request mechanism which is about all I need to address the issue I raised about the nonblocking collectives.

The only catch is with MPI_Igatherv, for instance, where the completion of a request is going to paint several memory areas. I'm considering changing a bit how the shadow_request table works, as follows:
 - forbid replacing an entry which is still marked in use. Such an event would look to me rather a bug than a desirable thing.
 - allow a 1-to-n mapping in there.

ok ?

E.
Comment 4 Emmanuel Thomé 2015-10-21 10:08:20 UTC
Created attachment 95071 [details]
patch against valgrind-3.11.0/mpi/libmpiwrap.c
Comment 5 Emmanuel Thomé 2015-10-21 10:09:10 UTC
Created attachment 95072 [details]
test case for MPI wrappers
Comment 6 Emmanuel Thomé 2015-10-21 10:10:58 UTC
Ok, here's a proposed patch against valgrind-3.11.0/mpi/libmpiwrap.c, as well as a test case.
All collectives from chapter 5 are covered, but I haven't covered the whole MPI-3 document.
There's a slight catch which may either be a gap in the MPI-3 document, or a misunderstanding on my part (completion of some non-blocking collectives and use of MPI_Get_count)
Comment 7 Alex 2018-09-12 22:20:23 UTC
Is there a reason nothing was ever done bout Emmanuel's patch? We observed these errors in our LibMesh project where we make use of MPI_IN_PLACE.
Comment 8 Philippe Waroquiers 2018-10-20 10:07:49 UTC
(In reply to Alex from comment #7)
> Is there a reason nothing was ever done bout Emmanuel's patch? We observed
> these errors in our LibMesh project where we make use of MPI_IN_PLACE.

I guess that this is due to a combination of:
  * there is not a lot of valgrind developers, so not much free time
  * not much (or zero?) knowledge of mpi in the valgrind developers
    (personally, my knowledge is zero).

This is a non minor patch, so it would be better that someone with mpi
knowledge does a review of the code and of the test case, and give feedback
on a real application.

That might make it ok to push in 3.15 ...
Comment 9 Pierre Marchand 2025-05-02 18:07:05 UTC
I got hit by this bug also. Emmanuel seems to have done most of the heavy lifting, but I understand it takes time to review.

That being, it could be mentioned in the documentation, especially since there is already a paragraph about false positive with reduce operations.
Comment 10 Mark Wielaard 2025-10-17 12:26:34 UTC
I don't have enough experience with MPI to do a proper review. But commenting to make sure this isn't forgotten. The patch still applies almost as with just one small conflict against commit 7b1a2b1edd99f15e23be0d259498247367f1e457 ("Fix printf warning in libmpiwrap.c").
Comment 11 Emmanuel Thomé 2025-11-18 14:12:45 UTC
Hi.

It's slightly funny to notice that this patch is now ten years old.

If it still applies (I take your word for it), then it probably still has some value. What can be envisioned as a path forward?

E.