[ofa-general] ***SPAM*** troubleshooting with infinband

Vittorio vitto.giova at yahoo.it
Fri Feb 13 19:34:03 PST 2009


Hello!
This is my first message on the list so i hope that i'm not going to ask
silly or already answered question

i'm a student and i'm porting an electromagnetic field simulator to a
parallel and distributed linux cluster for final thesis; i'm using both
OpenMP and MPI over Infiniband to achieve speed improvements

the openmp part is done and now i'm facing problem with setting up MPI over
Infinband
i have correctly set up the kernel modules
installed the right drivers for the board (mellanox hca) and userspace
programs
installed mpavich2 mpi implementation

however i fail to run all of this together:
for example ibhost correctly find the two nodes connected

Ca    : 0x0002c90300018b8e ports 2 " HCA-1"
Ca    : 0x0002c90300018b12 ports 2 "localhost HCA-1"

but ibping doens't receive responses

ibwarn: [32052] ibping: Ping..
ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
ibwarn: [32052] main: ibping to Lid 2 failed

subsequently any other operation with MPI fails
strangely enough however IPoIB works very well and i can ping and connect
with no problems

the two machines are identical and they use a crossover cable (point to
point)
lspci identifies the boards as
03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0
2.5GT/s] (rev a0)

what can be the cause of all of this? am i forgetting something?
any help is greatly appreciated
Thank you
Vittorio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090214/b5906b22/attachment.html>


More information about the general mailing list