[ofa-general] verb level interoperability between vendor's hcas
Scott A. Friedman
friedman at ucla.edu
Sun Jun 28 11:03:10 PDT 2009
We have had several tickets submitted by users since we have started
adding Qlogic 7240 cards into our cluster which is mostly Mellanox (we
have a couple different cards). We have looked at the codes (MPI based)
and they do run fine when the Qlogic cards are excluded. Qlogic suggests
using PSM or IPoIB on our cluster - both of which seem like a punt to us
as PSM doesn't make sense with Mellanox and IPofIB is not a solution.
Right now, we are trying to figure out where the problem is - it is not
at the application level as we have distilled down to a specific case
which will cause a problem (MPI all-to-all, for example). However, some
things seem clearer to us.
1. test case works when using verbs using Mellanox only
2. test case works ok when we use PSM on Qlogic only
3. test case fails when using verbs between Mellanox and Qlogic
4. test case fails when using verbs on Qlogic
Is this a verb level issue with the ipath stuff or an mpi problem? Or,
is the issue someplace else? There had been some discussion of a mixed
environment early this year on the OMPI list but the thread petered out.
We would be happy to share our failing test case with whomever does the
interop testing - if it could shed some light on the problem we see.
The point is that we would like to know that different IB cards work
together (like ethernet) so we can have a choice.
Sean Hefty wrote:
>> Is a mixed HCA environment cluster not ready for prime time - yet?
>
> Are the crashes in the kernel or userspace? Is there a specific HCA on the
> nodes that crash?
>
> Interop testing is done, but I do not know the details of the configurations and
> tests that are run.
>
More information about the general
mailing list