[ewg] Re: [ofa-general] OFED 1.2 Feb-26 meeting summary

Doug Ledford dledford at redhat.com
Fri Mar 16 07:05:27 PDT 2007


On Fri, 2007-03-02 at 20:42 -0500, Jeff Squyres wrote:
> To be totally clear, there are three issues:
> 
> 1. *NOT AN MPI ISSUE*: base location of the stack.  Doug has  
> repeatedly mentioned that /usr/local/ofed is not good.  This is a  
> group issue to decide.

As long as the base OFED stack is /usr/local/ofed, if someone calls Red
Hat support to get IB help with RHEL5 or RHEL4U5, they will be told that
they must first delete all locally built OFED RPMs from the system.  It
simply isn't realistic for us to try and support a system where
conflicting libraries can exist in different locations and attempts to
resolve the problem could end up being fruitless simply because the
wrong library is getting linked in behind our backs.

> 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm).  Not  
> deleting the buildroot is Bad; munging %build into %install is  
> Bad; ...etc.  This needs to change.  4 choices jump to mind:
> 
>     a. Keep the same scheme.  Ick.
>     b. Install while we build (i.e., the normal way to build a pile  
> of interdependent RPMs)
>     c. Use chroot (Red Hat does this in their internal setup, for  
> example)
>     d. Only distribute binary RPMs for supported platforms; source is  
> available for those who want it.

d. is the normal route for anyone wanting to provide a known working
environment.  Building locally is fraught with perils related to custom
compilers, custom core libraries, and other things that the EWG can't
control and can't realistically support.

> 3. Doug's final point about allowing multiple MPI's to play  
> harmoniously on a single system is obviously an MPI issue.  The /etc/ 
> alternatives mechanism is not really good enough (IMHO) for this -- / 
> etc/alternatives is about choosing one implementation and making  
> everyone use it.  The problem is that when multiple MPI's are  
> installed on a single system, people need all of them (some users  
> prefer one over the other, but much more important, some apps are  
> only certified with one MPI or another).

Correct.  You need the various MPI stacks to all be usable at the same
time.  The alternatives system doesn't really provide for this,
especially since OpenMPI expects to know how to behave based upon
argv[0].

>   The mpi-selector tool we  
> introduced in OFED 1.2 will likely be "good enough" for this purpose,  
> but we can also work on integrating the /etc/alternatives stuff if  
> desired, particularly for those who only need/want one MPI  
> implementation.

We implemented unique executable names with symlinks to the unique name
that provide a working argv[0] to OpenMPI regardless of what MPI is the
default.  So, for instance, if you want to use i386 OpenMPI on x86_64,
you can do this:

export PATH=/usr/share/openmpi/bin32:$PATH
mpicc -o blah blah.c

and things just work.  The /usr/share/openmpi/bin32 directory has the
right symlinks to the file in /usr/bin to make it happen.  Now, that
being said, I'm not really happy with that solution and would prefer to
have a solution that works for all the MPIs.  The only FHS compliant
location that I know of where we can put a bin directory under our
parent directory is /opt.  So, I would suggest that for OpenMPI at
least, we standardize on it going into /opt/openmpi/$VERSION_MAJOR-$CC
and under that we have a bin (only need one bin if we can get the single
binaries to support both 32/64 bit operation via a command line switch),
lib, lib64, share, man, etc.  Then users can do the same basic thing as
above, but using a path in /opt.  The alternatives system could link the
system wide default to the binaries in /opt easy enough.  That allows
both a system wide and user specified version to work seamlessly.

> 
> On Feb 28, 2007, at 12:56 PM, Doug Ledford wrote:
> 
> > On Wed, 2007-02-28 at 16:10 +0200, Tziporet Koren wrote:
> >>       * Improved RPM usage by the install will not be part of OFED
> >>         1.2
> >
> > Since I first brought this up, you have added new libraries, iWARP
> > support, etc.  These constitute new RPMs.  And, because you guys have
> > been doing things contrary to standards like the file hierarchy  
> > standard
> > in the original RPMs, it's been carried forward to these new RPMs.   
> > This
> > is a snowball, and the longer you put off fixing it, the harder it  
> > gets
> > to change.  And not just in your RPMs either.  The longer you put off
> > coming up with a reasonable standard for MPI library and executable  
> > file
> > locations, the longer customers will hand roll their own site specific
> > setups, and the harder it will be to get them to switch over to the
> > standard once you *do* implement it.  You may end up dooming Jeff to
> > maintaining those custom file location hacks in the OpenMPI spec
> > forever.
> >
> > Not to mention that interoperability is about more than one machine
> > talking to another machine.  It's also about a customer's application
> > building properly on different versions of the stack, without the
> > customer needing to change all the include file locations and link
> > parameters.  It's also about a customer being able to rest assured  
> > that
> > if they tried to install two conflicting copies of libibverbs, it  
> > would
> > in fact cause RPM to throw conflict errors (which it doesn't now  
> > because
> > your libibverbs is in /usr/local, where I'm not allowed to put  
> > ours, so
> > since the files are in different locations, rpm will happily let the
> > user install both your libibverbs and my libibverbs without a  
> > conflict,
> > and a customer could waste large amounts of time trying to track  
> > down a
> > bug in one library only to find out their application is linking  
> > against
> > the other).
> >
> >>               * The RPM usage will be enhanced for the next (1.3)
> >>                 release and we will decide on the correct way in
> >>                 Sonoma.
> >
> >
> >
> > There's not really much to decide.  Either the stack is Linux File
> > Hierarchy Standard compliant or it isn't.  The only leeway for  
> > decisions
> > allowed by the standard is on things like where in /etc to put the
> > config files (since you guys are striving to be a generic RDMA stack,
> > not just an IB stack, I would suggest that all RDMA related config  
> > files
> > go into /etc/rdma, and for those applications that can reasonably  
> > be run
> > absent RDMA technology, like OpenMPI, I would separate their config
> > files off into either /etc or /etc/openmpi, ditto for the include
> > directories, /usr/include/rdma for the generic non-IB specific stuff,
> > and possibly /usr/include/rdma/infiniband for IB specific stuff, or  
> > you
> > could put the IB stuff under /usr/include/infiniband, either way).
> >
> > The biggest variation from the spec that needs to be dealt with is the
> > need for multiple MPI installations, which is problematic if you just
> > use generic locations as it stands today, but with a few modifications
> > to the MPI stack it could be worked around.
> >
> >
> > -- 
> > Doug Ledford <dledford at redhat.com>
> >               GPG KeyID: CFBFF194
> >               http://people.redhat.com/dledford
> >
> > Infiniband specific RPMs available at
> >               http://people.redhat.com/dledford/Infiniband
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> > openib-general
> 
> 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20070316/8c34d6f2/attachment.sig>


More information about the ewg mailing list