[ewg] How to Debug OFED Issue After Install?

bright.yang at vaisala.com bright.yang at vaisala.com
Fri Jul 15 15:17:54 PDT 2011


Hi,

 

  I'm new to the group and am not sure is this is the right place for my
questions - 

 

After I installed OFED on my cluster, I was able to use Infiniband in my
system, but since then the system was moved and all the infiniband
cables, nodes and switch were removed and re-connected. I cannot
initiating my parallel computing jobs any more. Whenever I run ibhosts
command, I got some message like this -

 

>ibhosts

src/query_smp.c:192; umad (DR path slid 0; dlid 0; 0,1,2 Attr 0x11:0)
bad status 110; Connection timed out

Ca      : 0x0011750000ff585f ports 1 "compute-0-9 HCA-1"

Ca      : 0x0011750000ff5815 ports 1 "compute-0-8 HCA-1"

Ca      : 0x0011750000ff5860 ports 1 "compute-0-7 HCA-1"

Ca      : 0x0011750000ff588e ports 1 "compute-0-6 HCA-1"

Ca      : 0x0011750000ff5821 ports 1 "compute-0-5 HCA-1"

Ca      : 0x0011750000ff57f3 ports 1 "compute-0-4 HCA-1"

Ca      : 0x0011750000ff58a3 ports 1 "compute-0-3 HCA-1"

Ca      : 0x0011750000ff579a ports 1 "compute-0-2 HCA-1"

Ca      : 0x0011750000ff57d0 ports 1 "compute-0-1 HCA-1"

Ca      : 0x0011750000ff58a2 ports 1 "kratos HCA-1"

 

I wonder where I can get more helpful debug information on this kind of
issues? I ran dmesg but didn't seem to get any related messages -

>bmesg

ib_qib 0000:07:00.0: IB0:1 got a lid: 0x1

ib_mad: Method 1 already in use

 

Thanks.

Bright Yang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110715/2468fad5/attachment.html>


More information about the ewg mailing list