Bug 401416

Summary: Compile failure with openmpi 4.0
Product: [Developer tools] valgrind Reporter: Orion Poplawski <orion>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: balint, mark
Priority: NOR    
Version: 3.14.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Stop using symbols dropped in openmpi 4.0
Only use MPI1 symbols when properly defined

Description Orion Poplawski 2018-11-26 05:02:12 UTC
SUMMARY

OpenMPI drops support for MPI1 by default.  This leads to a compile failure:

Making all in mpi
make[2]: Entering directory `/builddir/build/BUILD/valgrind-3.14.0/mpi'
/usr/lib64/openmpi/bin/mpicc    -I../include  -g -O -fno-omit-frame-pointer -Wall -fpic -m64 -Wno-deprecated-declarations  -c -o libmpiwrap_ppc64le_linux_so-libmpiwrap.o `test -f 'libmpiwrap.c' || echo './'`libmpiwrap.c
for f in ; do \
  if [ ! -e $f.dSYM  -o  $f -nt $f.dSYM ] ; then \
      echo "dsymutil $f"; \
      dsymutil $f; \
  fi; \
done
mkdir -p ../.in_place; \
for f in ; do \
  rm -f ../.in_place/$f.dSYM; \
  ln -f -s ../mpi/$f.dSYM ../.in_place; \
done
BUILDSTDERR: libmpiwrap.c: In function 'showTy':
BUILDSTDERR: libmpiwrap.c:281:19: error: 'MPI_UB' undeclared (first use in this function)
BUILDSTDERR:     else if (ty == MPI_UB)             fprintf(f,"UB");
BUILDSTDERR:                    ^
BUILDSTDERR: libmpiwrap.c:281:19: note: each undeclared identifier is reported only once for each function it appears in
BUILDSTDERR: libmpiwrap.c:282:19: error: 'MPI_LB' undeclared (first use in this function)
BUILDSTDERR:     else if (ty == MPI_LB)             fprintf(f,"LB");
BUILDSTDERR:                    ^
BUILDSTDERR: libmpiwrap.c: In function 'extentOfTy':
BUILDSTDERR: libmpiwrap.c:462:4: warning: implicit declaration of function 'PMPI_Type_extent' [-Wimplicit-function-declaration]
BUILDSTDERR:     r = PMPI_Type_extent(ty, &n);
BUILDSTDERR:     ^
BUILDSTDERR: libmpiwrap.c: In function 'walk_type':
BUILDSTDERR: libmpiwrap.c:736:17: error: 'MPI_LB' undeclared (first use in this function)
BUILDSTDERR:        if (ty == MPI_LB || ty == MPI_UB)
BUILDSTDERR:                  ^
BUILDSTDERR: libmpiwrap.c:736:33: error: 'MPI_UB' undeclared (first use in this function)
BUILDSTDERR:        if (ty == MPI_LB || ty == MPI_UB)
BUILDSTDERR:                                  ^
BUILDSTDERR: make[2]: *** [libmpiwrap_ppc64le_linux_so-libmpiwrap.o] Error 1
Comment 1 Julian Seward 2019-03-10 08:47:59 UTC
Do you mean that OpenMPI no longer supports MPI1 at all?
Comment 2 Orion Poplawski 2019-03-10 15:00:22 UTC
Well, not really.  By default, OpenMPI 4.0 does not support MPI1.  It is possible to enable it (for now) with a compile time flag, and this is probably what we'll do in Fedora for now.  But support is going away.
Comment 3 Balint Reczey 2019-11-29 09:43:35 UTC
Created attachment 124176 [details]
Stop using symbols dropped in openmpi 4.0

I'm attaching the patch used in Ubuntu.
Comment 4 Mark Wielaard 2021-03-16 12:45:23 UTC
(In reply to Balint Reczey from comment #3)
> Created attachment 124176 [details]
> Stop using symbols dropped in openmpi 4.0
> 
> I'm attaching the patch used in Ubuntu.

Is there a reason to remove them completely instead of keep using the if   defined (...) constructs?
Comment 5 Julian Seward 2021-03-16 12:56:57 UTC
(In reply to Mark Wielaard from comment #4) 
> Is there a reason to remove them completely instead of keep using the if  
> defined (...) constructs?

I had wondered that too.  I'm trying the if-defined scheme right now.
Comment 6 Mark Wielaard 2021-03-16 14:42:04 UTC
Looking at https://www.open-mpi.org/faq/?category=mpi-removed I would suggest the following patch:

diff --git a/mpi/libmpiwrap.c b/mpi/libmpiwrap.c
index 488bb13fd..25eb66480 100644
--- a/mpi/libmpiwrap.c
+++ b/mpi/libmpiwrap.c
@@ -278,8 +278,12 @@ static void showTy ( FILE* f, MPI_Datatype ty )
    else if (ty == MPI_LONG_INT)       fprintf(f,"LONG_INT");
    else if (ty == MPI_SHORT_INT)      fprintf(f,"SHORT_INT");
    else if (ty == MPI_2INT)           fprintf(f,"2INT");
+#  if defined(MPI_UB)
    else if (ty == MPI_UB)             fprintf(f,"UB");
+#  endif
+#  if defined(MPI_LB)
    else if (ty == MPI_LB)             fprintf(f,"LB");
+#  endif
 #  if defined(MPI_WCHAR)
    else if (ty == MPI_WCHAR)          fprintf(f,"WCHAR");
 #  endif
@@ -459,7 +463,12 @@ static long extentOfTy ( MPI_Datatype ty )
 {
    int      r;
    MPI_Aint n;
+#if defined(MPI_TYPE_EXTENT)
    r = PMPI_Type_extent(ty, &n);
+#else
+   MPI_Aint lb;
+   r = MPI_Type_get_extent(ty, &lb, &n);
+#endif
    assert(r == MPI_SUCCESS);
    return (long)n;
 }
@@ -733,8 +742,10 @@ void walk_type ( void(*f)(void*,long), char* base, MPI_Datatype ty )
          f(base + offsetof(Ty,loc), sizeof(int));
          return;
       }
+#if defined(MPI_LB)
       if (ty == MPI_LB || ty == MPI_UB)
          return; /* have zero size, so nothing needs to be done */
+#endif
       goto unhandled;
       /*NOTREACHED*/
    }

That builds against an openmpi configured with --enable-mpi1-compatibility (fedora 33) and without (ubuntu 20.10)
Comment 7 Mark Wielaard 2021-03-16 23:20:23 UTC
Created attachment 136756 [details]
Only use MPI1 symbols when properly defined

Julian came up with a slightly better fix that works for me on both fedora 33 and ubuntu 20.10
Comment 8 Julian Seward 2021-03-17 07:15:30 UTC
Fixed, 3415e1e1acc5095607663071db299f961bd65bde.