***SPAM*** Re: [ofa-general] ***SPAM*** troubleshooting with infinband

Vittorio vitto.giova at yahoo.it
Sat Feb 14 05:49:54 PST 2009


thanks for the suggestion, but i can't understand which kind of address i
should put for the two commands
i tried ibping with the server (like suggested) and it works with -G <port>
or with lid

but what should i put as argument of ibv_rc_pingpong and rping?

thanks a lot
Vittorio

On Sat, Feb 14, 2009 at 8:23 AM, Dotan Barak <dotanba at gmail.com> wrote:

> Vittorio wrote:
>
>> Hello!
>> This is my first message on the list so i hope that i'm not going to ask
>> silly or already answered question
>>
>> i'm a student and i'm porting an electromagnetic field simulator to a
>> parallel and distributed linux cluster for final thesis; i'm using both
>> OpenMP and MPI over Infiniband to achieve speed improvements
>>
>> the openmp part is done and now i'm facing problem with setting up MPI
>> over Infinband
>> i have correctly set up the kernel modules
>> installed the right drivers for the board (mellanox hca) and userspace
>> programs
>> installed mpavich2 mpi implementation
>>
>> however i fail to run all of this together:
>> for example ibhost correctly find the two nodes connected
>>
>> Ca    : 0x0002c90300018b8e ports 2 " HCA-1"
>> Ca    : 0x0002c90300018b12 ports 2 "localhost HCA-1"
>>
>> but ibping doens't receive responses
>>
>> ibwarn: [32052] ibping: Ping..
>> ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
>> ibwarn: [32052] main: ibping to Lid 2 failed
>>
>> subsequently any other operation with MPI fails
>> strangely enough however IPoIB works very well and i can ping and connect
>> with no problems
>>
>> the two machines are identical and they use a crossover cable (point to
>> point)
>> lspci identifies the boards as
>> 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe
>> 2.0 2.5GT/s] (rev a0)
>>
>> what can be the cause of all of this? am i forgetting something?
>> any help is greatly appreciated
>> Thank you
>> Vittorio
>>
> I suggest that you will execute the ibv_rc_pingpong  and see that the IB
> connectivity is o.k..
> Then try to execute rping to check that the ib_cma is o.k..
>
> Those will be a good start point to find the problem
> (do it for all of the active ports that you have).
>
>
> Dotan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090214/1694fea9/attachment.html>


More information about the general mailing list