[ofa-general] ***SPAM*** troubleshooting with infinband
Vittorio
vitto.giova at yahoo.it
Fri Feb 13 19:34:03 PST 2009
Hello!
This is my first message on the list so i hope that i'm not going to ask
silly or already answered question
i'm a student and i'm porting an electromagnetic field simulator to a
parallel and distributed linux cluster for final thesis; i'm using both
OpenMP and MPI over Infiniband to achieve speed improvements
the openmp part is done and now i'm facing problem with setting up MPI over
Infinband
i have correctly set up the kernel modules
installed the right drivers for the board (mellanox hca) and userspace
programs
installed mpavich2 mpi implementation
however i fail to run all of this together:
for example ibhost correctly find the two nodes connected
Ca : 0x0002c90300018b8e ports 2 " HCA-1"
Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1"
but ibping doens't receive responses
ibwarn: [32052] ibping: Ping..
ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
ibwarn: [32052] main: ibping to Lid 2 failed
subsequently any other operation with MPI fails
strangely enough however IPoIB works very well and i can ping and connect
with no problems
the two machines are identical and they use a crossover cable (point to
point)
lspci identifies the boards as
03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0
2.5GT/s] (rev a0)
what can be the cause of all of this? am i forgetting something?
any help is greatly appreciated
Thank you
Vittorio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090214/b5906b22/attachment.html>
More information about the general
mailing list