[ofa-general] verb level interoperability between vendor's hcas

Scott A. Friedman friedman at ucla.edu
Sun Jun 28 11:03:10 PDT 2009


We have had several tickets submitted by users since we have started 
adding Qlogic 7240 cards into our cluster which is mostly Mellanox (we 
have a couple different cards). We have looked at the codes (MPI based) 
and they do run fine when the Qlogic cards are excluded. Qlogic suggests 
using PSM or IPoIB on our cluster - both of which seem like a punt to us 
as PSM doesn't make sense with Mellanox and IPofIB is not a solution.

Right now, we are trying to figure out where the problem is - it is not 
at the application level as we have distilled down to a specific case 
which will cause a problem (MPI all-to-all, for example). However, some 
things seem clearer to us.

1. test case works when using verbs using Mellanox only
2. test case works ok when we use PSM on Qlogic only
3. test case fails when using verbs between Mellanox and Qlogic
4. test case fails when using verbs on Qlogic

Is this a verb level issue with the ipath stuff or an mpi problem? Or, 
is the issue someplace else? There had been some discussion of a mixed 
environment early this year on the OMPI list but the thread petered out.

We would be happy to share our failing test case with whomever does the 
interop testing - if it could shed some light on the problem we see.

The point is that we would like to know that different IB cards work 
together (like ethernet) so we can have a choice.

Sean Hefty wrote:
>> Is a mixed HCA environment cluster not ready for prime time - yet?
> 
> Are the crashes in the kernel or userspace?  Is there a specific HCA on the
> nodes that crash?
> 
> Interop testing is done, but I do not know the details of the configurations and
> tests that are run. 
> 



More information about the general mailing list