[ofa-general] verb level interoperability between vendor's hcas

Ross Smith myxiplx at googlemail.com
Mon Jun 29 04:00:51 PDT 2009


Guys, I might be misunderstanding, but:

> 1. test case works when using verbs using Mellanox only
> 2. test case works ok when we use PSM on Qlogic only
> 3. test case fails when using verbs between Mellanox and Qlogic
> 4. test case fails when using verbs on Qlogic

Doesn't this show that verbs / MPI fail full stop with Qlogic?  That's
doesn't look like vendor interoperability so much as a bug with verbs
and Qlogic hardware.

Ross
(Infiniband newbie)



On Sun, Jun 28, 2009 at 7:03 PM, Scott A. Friedman<friedman at ucla.edu> wrote:
> We have had several tickets submitted by users since we have started adding
> Qlogic 7240 cards into our cluster which is mostly Mellanox (we have a
> couple different cards). We have looked at the codes (MPI based) and they do
> run fine when the Qlogic cards are excluded. Qlogic suggests using PSM or
> IPoIB on our cluster - both of which seem like a punt to us as PSM doesn't
> make sense with Mellanox and IPofIB is not a solution.
>
> Right now, we are trying to figure out where the problem is - it is not at
> the application level as we have distilled down to a specific case which
> will cause a problem (MPI all-to-all, for example). However, some things
> seem clearer to us.
>
> 1. test case works when using verbs using Mellanox only
> 2. test case works ok when we use PSM on Qlogic only
> 3. test case fails when using verbs between Mellanox and Qlogic
> 4. test case fails when using verbs on Qlogic
>
> Is this a verb level issue with the ipath stuff or an mpi problem? Or, is
> the issue someplace else? There had been some discussion of a mixed
> environment early this year on the OMPI list but the thread petered out.
>
> We would be happy to share our failing test case with whomever does the
> interop testing - if it could shed some light on the problem we see.
>
> The point is that we would like to know that different IB cards work
> together (like ethernet) so we can have a choice.
>
> Sean Hefty wrote:
>>>
>>> Is a mixed HCA environment cluster not ready for prime time - yet?
>>
>> Are the crashes in the kernel or userspace?  Is there a specific HCA on
>> the
>> nodes that crash?
>>
>> Interop testing is done, but I do not know the details of the
>> configurations and
>> tests that are run.
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list