[ofa-general] verb level interoperability between vendor's hcas

Jeff Squyres jsquyres at cisco.com
Mon Jun 29 03:55:54 PDT 2009


There are definite things that won't work by default in Open MPI with  
mixed vendor HCAs.

As you mentioned, the Open MPI IB vendors committed to making this  
work, but the effort kinda died.  You might want to ping them again to  
remind them...?


On Jun 28, 2009, at 2:03 PM, Scott A. Friedman wrote:

> We have had several tickets submitted by users since we have started
> adding Qlogic 7240 cards into our cluster which is mostly Mellanox (we
> have a couple different cards). We have looked at the codes (MPI  
> based)
> and they do run fine when the Qlogic cards are excluded. Qlogic  
> suggests
> using PSM or IPoIB on our cluster - both of which seem like a punt  
> to us
> as PSM doesn't make sense with Mellanox and IPofIB is not a solution.
>
> Right now, we are trying to figure out where the problem is - it is  
> not
> at the application level as we have distilled down to a specific case
> which will cause a problem (MPI all-to-all, for example). However,  
> some
> things seem clearer to us.
>
> 1. test case works when using verbs using Mellanox only
> 2. test case works ok when we use PSM on Qlogic only
> 3. test case fails when using verbs between Mellanox and Qlogic
> 4. test case fails when using verbs on Qlogic
>
> Is this a verb level issue with the ipath stuff or an mpi problem? Or,
> is the issue someplace else? There had been some discussion of a mixed
> environment early this year on the OMPI list but the thread petered  
> out.
>
> We would be happy to share our failing test case with whomever does  
> the
> interop testing - if it could shed some light on the problem we see.
>
> The point is that we would like to know that different IB cards work
> together (like ethernet) so we can have a choice.
>
> Sean Hefty wrote:
> >> Is a mixed HCA environment cluster not ready for prime time - yet?
> >
> > Are the crashes in the kernel or userspace?  Is there a specific  
> HCA on the
> > nodes that crash?
> >
> > Interop testing is done, but I do not know the details of the  
> configurations and
> > tests that are run.
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>


-- 
Jeff Squyres
Cisco Systems




More information about the general mailing list