[ewg] OFED installer / Open MPI: bug 476

Jeff Squyres jsquyres at cisco.com
Mon Mar 26 07:23:11 PDT 2007


Short version
=============

There is significant pushback in the OFA community from forcing "ugly  
code" (most recent example is making warning-free code).  I am now  
creating the same pushback on the OFED installer.  Despite the fact  
that OMPI integrates seamlessly with many other configuration/ 
installation systems, I have had to add hundreds of lines of code  
(costing dozens of man-hours) to Open MPI to make it work with the  
OFED installer.  And it still breaks all the time, therefore forcing  
me to add more ugly code to an already delicate/fragile system.

I'll continue to fix bugs in OMPI, fix when wrong flags are getting  
pushed into OMPI's configure/build system from the OFED  
installer, ...etc., but I am unwilling to "fix" bug 476 when it's not  
Open MPI's problem.  Someone will need to convince me why I need to  
do any more work to make Open MPI integrate into the extremely  
fragile, non-standard, and harmful OFED installer.

Perhaps we can discuss this on the call today.

Long version
============

Bug 476 was a 3rd repeat of the same issue: someone stating that Open  
MPI could not be installed in OFED 1.2.  It was the 2nd time that  
someone filed it as a P1/blocker issuer.

The real problem is that Open MPI cannot be built under the OFED  
installer when OFED is already installed *because of the non-standard  
way the OFED installer works*.  I specifically stated that this exact  
issue would be a problem back when I was getting pressure to make  
Open MPI build under the OFED installer's "build" phase, but no one  
listened/cared/understood.  Now it's apparently a P1/blocker problem.

This is one source of my frustration: to say "this is going to be a  
problem!", have everyone ignore it, and then when I do it, have it  
filed (twice) as a P1/blocker.

All 3 MPI packages have had to go through significant hurdles to  
integrate into the OFED installer.  Whenever a new problem occurs,  
the MPI maintainers are told, "it's your problem -- go fix it."  So  
we go write more code, put in horrid/ugly workarounds, and we grumble  
amongst ourselves.

We have already agreed to redesign the installer for OFED 1.3 (which  
is great), and that we won't do any significant changes to the  
current OFED 1.2 installer because it's too late in the release  
process (which we'll cope with).  I have even volunteered to partake  
in the redesign effort.  But the problem is the fact that many of the  
recent Open MPI / MVAPICH / MVAPICH2 bugs in OFED 1.2 have been  
directly or indirectly because of the installer.  The installer is  
therefore causing/contributing to confusion and delay of OFED 1.2.

Since the OFED installer is directly/indirectly responsible for many  
of the MPI problems, perhaps the OFED 1.2 installer can be fixed  
instead of pushing the work off to the MPI's.  In particular, bug  
476: the OFED installer should disallow building OFED when OFED is  
already installed.  I realize that this makes the OFED installer's  
"build" phase [significantly] less useful.  But it will take a lot of  
convincing to have me "fix" Open MPI when it's not Open MPI's problem.

Bottom line: although the other MPI's may have already done the work  
to make this feature work for them, I am extremely unhappy at the  
prospect of doing so because I have already spent dozens of man-hours  
distorting OMPI's standards-conformant configure/build system to fit  
the non-standard OFED.  I a) do not want to make OMPI's build  
integration with OFED any uglier/difficult to maintain than it  
already is and b) have other things to do.

Before you ask: yes, of course I'll still fix bugs in Open MPI, or  
where wrong flags are getting pushed into OMPI's build process, or  
other similar issues; that's not what I'm talking about here.

-- 
Jeff Squyres
Cisco Systems




More information about the ewg mailing list