[openib-general] got ipoib up once but not twice :-)

Ronald G. Minnich rminnich at lanl.gov
Fri Jan 14 12:07:43 PST 2005


OK, I had all of bluesteel up yesterday. It all "just worked"

insmod the right stuff on front end, i.e. 
ib_ipoib               53856  0
ib_sa                  12564  1 ib_ipoib
ib_umad                12224  5
ib_mthca               90976  9
ib_mad                 29872  3 ib_sa,ib_umad,ib_mthca
ib_core                43264  67 ib_ipoib,ib_sa,ib_umad,ib_mthca,ib_mad

insmod the right stuff on the nodes, i.e. 
ib_mthca               90976  0
ib_umad                12224  0
ib_ipoib               53856  0
ib_sa                  12564  1 ib_ipoib
ib_mad                 29872  3 ib_mthca,ib_umad,ib_sa
ib_core                43264  5 ib_mthca,ib_umad,ib_ipoib,ib_sa,ib_mad

run opensm -v 
   (get lots of messages that look ok)
   (you probably don't want to see this ...)

ipconfig front end
ib0       Link encap:UNSPEC  HWaddr 
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.4.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::202:c901:8a0:3e61/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:73 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:4388 (4.2 Kb)

ipconfig bproc slaves

ib0       Link encap:UNSPEC  HWaddr 
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.4.2.10  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

All this was fine. 

As of today, however, I've got opensm up ok, and the messages all look ok; 
the kernel messages on slave nodes look fine. But, sadly, no joy on ipoib.


I'm not sure where to start looking given that all the positive indicators 
are still positive. What's a sensible thing to do at this point? 

many thanks

ron



More information about the general mailing list