[ewg] OFED installer / Open MPI: bug 476
Jeff Squyres
jsquyres at cisco.com
Mon Mar 26 07:23:11 PDT 2007
Short version
=============
There is significant pushback in the OFA community from forcing "ugly
code" (most recent example is making warning-free code). I am now
creating the same pushback on the OFED installer. Despite the fact
that OMPI integrates seamlessly with many other configuration/
installation systems, I have had to add hundreds of lines of code
(costing dozens of man-hours) to Open MPI to make it work with the
OFED installer. And it still breaks all the time, therefore forcing
me to add more ugly code to an already delicate/fragile system.
I'll continue to fix bugs in OMPI, fix when wrong flags are getting
pushed into OMPI's configure/build system from the OFED
installer, ...etc., but I am unwilling to "fix" bug 476 when it's not
Open MPI's problem. Someone will need to convince me why I need to
do any more work to make Open MPI integrate into the extremely
fragile, non-standard, and harmful OFED installer.
Perhaps we can discuss this on the call today.
Long version
============
Bug 476 was a 3rd repeat of the same issue: someone stating that Open
MPI could not be installed in OFED 1.2. It was the 2nd time that
someone filed it as a P1/blocker issuer.
The real problem is that Open MPI cannot be built under the OFED
installer when OFED is already installed *because of the non-standard
way the OFED installer works*. I specifically stated that this exact
issue would be a problem back when I was getting pressure to make
Open MPI build under the OFED installer's "build" phase, but no one
listened/cared/understood. Now it's apparently a P1/blocker problem.
This is one source of my frustration: to say "this is going to be a
problem!", have everyone ignore it, and then when I do it, have it
filed (twice) as a P1/blocker.
All 3 MPI packages have had to go through significant hurdles to
integrate into the OFED installer. Whenever a new problem occurs,
the MPI maintainers are told, "it's your problem -- go fix it." So
we go write more code, put in horrid/ugly workarounds, and we grumble
amongst ourselves.
We have already agreed to redesign the installer for OFED 1.3 (which
is great), and that we won't do any significant changes to the
current OFED 1.2 installer because it's too late in the release
process (which we'll cope with). I have even volunteered to partake
in the redesign effort. But the problem is the fact that many of the
recent Open MPI / MVAPICH / MVAPICH2 bugs in OFED 1.2 have been
directly or indirectly because of the installer. The installer is
therefore causing/contributing to confusion and delay of OFED 1.2.
Since the OFED installer is directly/indirectly responsible for many
of the MPI problems, perhaps the OFED 1.2 installer can be fixed
instead of pushing the work off to the MPI's. In particular, bug
476: the OFED installer should disallow building OFED when OFED is
already installed. I realize that this makes the OFED installer's
"build" phase [significantly] less useful. But it will take a lot of
convincing to have me "fix" Open MPI when it's not Open MPI's problem.
Bottom line: although the other MPI's may have already done the work
to make this feature work for them, I am extremely unhappy at the
prospect of doing so because I have already spent dozens of man-hours
distorting OMPI's standards-conformant configure/build system to fit
the non-standard OFED. I a) do not want to make OMPI's build
integration with OFED any uglier/difficult to maintain than it
already is and b) have other things to do.
Before you ask: yes, of course I'll still fix bugs in Open MPI, or
where wrong flags are getting pushed into OMPI's build process, or
other similar issues; that's not what I'm talking about here.
--
Jeff Squyres
Cisco Systems
More information about the ewg
mailing list