[ewg] How to Debug OFED Issue After Install?
bright.yang at vaisala.com
bright.yang at vaisala.com
Fri Jul 15 15:17:54 PDT 2011
Hi,
I'm new to the group and am not sure is this is the right place for my
questions -
After I installed OFED on my cluster, I was able to use Infiniband in my
system, but since then the system was moved and all the infiniband
cables, nodes and switch were removed and re-connected. I cannot
initiating my parallel computing jobs any more. Whenever I run ibhosts
command, I got some message like this -
>ibhosts
src/query_smp.c:192; umad (DR path slid 0; dlid 0; 0,1,2 Attr 0x11:0)
bad status 110; Connection timed out
Ca : 0x0011750000ff585f ports 1 "compute-0-9 HCA-1"
Ca : 0x0011750000ff5815 ports 1 "compute-0-8 HCA-1"
Ca : 0x0011750000ff5860 ports 1 "compute-0-7 HCA-1"
Ca : 0x0011750000ff588e ports 1 "compute-0-6 HCA-1"
Ca : 0x0011750000ff5821 ports 1 "compute-0-5 HCA-1"
Ca : 0x0011750000ff57f3 ports 1 "compute-0-4 HCA-1"
Ca : 0x0011750000ff58a3 ports 1 "compute-0-3 HCA-1"
Ca : 0x0011750000ff579a ports 1 "compute-0-2 HCA-1"
Ca : 0x0011750000ff57d0 ports 1 "compute-0-1 HCA-1"
Ca : 0x0011750000ff58a2 ports 1 "kratos HCA-1"
I wonder where I can get more helpful debug information on this kind of
issues? I ran dmesg but didn't seem to get any related messages -
>bmesg
ib_qib 0000:07:00.0: IB0:1 got a lid: 0x1
ib_mad: Method 1 already in use
Thanks.
Bright Yang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110715/2468fad5/attachment.html>
More information about the ewg
mailing list