[openib-general] IPoIB still not working [was IPoIB FAQ Update]

Josh England jjengla at sandia.gov
Sun Dec 5 20:02:56 PST 2004


I'm following the FAQ pretty closely and IPoIB is still not working for
me.  I'm using the latest from SVN (as of tonight).  These are PCIe HCAs
with 4.5.3 firmware (x86_64 MST is horribly broken BTW --can't wait for
tvflash).  Hardware/cabling is fine since things work under VAPI 3.2.
All modules are loaded, and syslog doesn't show anything.  It seems now
(new with 4.5.3 FW) that packets are flowing one direction but not the
other.  Here's as much debug info as I could dig up.


ping from n0 to n1:
/sys/class/net/ib0/statistics/rx_packets on n1 increases

ping from n1 to n0:
/sys/class/net/ib0/statistics/rx_packets remains at 0


tcpdump shows nothing on either node


n0:/ipoib_debugfs/ib0_mcg shows:
GID: ff12:401b:ffff:0:0:0:0:1                  created: 4295359818
queuelen: 0 complete: 1 send_only: 0
GID: ff12:401b:ffff:0:0:0:ffff:ffff            created: 4295359818
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:1                  created: 4295359819
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:2                  created: 4295361295
queuelen: 2 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:0:0:16                 created: 4295359821
queuelen: 0 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:1:ff00:6693            created: 4295359819
queuelen: 0 complete: 1 send_only: 0

n1:/ipoib_debugfs/ib0_mcg shows:
GID: ff12:401b:ffff:0:0:0:0:1                  created: 4294846326
queuelen: 0 complete: 1 send_only: 0
GID: ff12:401b:ffff:0:0:0:ffff:ffff            created: 4294846326
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:1                  created: 4294846327
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:2                  created: 4294847749
queuelen: 2 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:0:0:16                 created: 4294846329
queuelen: 0 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:1:ff00:66bf            created: 4294846327
queuelen: 0 complete: 1 send_only: 0


ifconfig -a on n0:
eth0      Link encap:Ethernet  HWaddr 00:30:48:53:69:D8
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:41000 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32882 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21739542 (20.7 Mb)  TX bytes:4794259 (4.5 Mb)
          Base address:0xbc00 Memory:feae0000-feb00000

ib0       Link encap:UNSPEC  HWaddr
00-00-00-84-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.0.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::206:6a00:a000:6693/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:2768 (2.7 Kb)

ib1       Link encap:UNSPEC  HWaddr
00-00-00-85-FE-80-00-00-00-00-00-00-00-00-00-00
          BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:70 errors:0 dropped:0 overruns:0 frame:0
          TX packets:70 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6748 (6.5 Kb)  TX bytes:6748 (6.5 Kb)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

ifconfig -a on n1:
eth0      Link encap:Ethernet  HWaddr 00:30:48:53:6A:77
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:41747 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33587 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21841556 (20.8 Mb)  TX bytes:4897945 (4.6 Mb)
          Base address:0xbc00 Memory:feaa0000-feac0000

ib0       Link encap:UNSPEC  HWaddr
00-00-00-84-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.0.0.2  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::206:6a00:a000:66bf/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:45 errors:0 dropped:0 overruns:0 frame:0
          TX packets:69 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:2520 (2.4 Kb)  TX bytes:4372 (4.2 Kb)

ib1       Link encap:UNSPEC  HWaddr
00-00-00-85-FE-80-00-00-00-00-00-00-00-00-00-00
          BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2940 (2.8 Kb)  TX bytes:2940 (2.8 Kb)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


Is there anything else I can try?

-JE


On Fri, 2004-12-03 at 14:43 -0500, Hal Rosenstock wrote:
> Here's an update to my initial attempt at an IPoIB FAQ:
> 
> ping doesn't work between IPoIB nodes. What should I do ?
> 
> First, verify that the ports are active.
> 
> This can be done via:
> 
> cat /sys/class/infiniband/mthca0/ports/1/state
> 
> This should indicate 4: ACTIVE
> 
> assuming the HCA is mthca0 and port 1 is the one plugged into the subnet
> (switch, etc.). 
> 
> If the port is not active, there could be several reasons:
> 
> 1. You need an SM in your subnet to bring the ports to active. Do you
> have an SM ? This can be embedded in a switch or some other IB hardware
> or run on an end node (HCA) although OpenIB (gen2) does not currently
> support this.
> 
> 2. If you have an SM in your subnet, there might be a cabling problem 
> where the SM cannot "reach" your end node.
> 
> If the port is active, indicate the subnet configuration and which SM is
> being utilized.
> 
> Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0"
> show anything on the other nodes when you try to ping or something?
> 
> There are 2 levels of IPoIB debug which can be enabled when building:
> IP-over-InfiniBand debugging and IP-over-InfiniBand data path debugging.
> The latter has performance implications and should only be enabled when
> all else fails. Enable the first level of IPoIB debug and then:
> 
> mount -t ipoib_debugfs none /ipoib_debufs/
> cat /ipoib_debugfs/ib0_mcg
> 
> Other things to verify and supply to help isolate the problem:
> 
> 1. Verify the firmware version via
> 
> cat /sys/class/infiniband/mthca0/fw_ver
> 
> For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version 
> 4.5.3 is recommended.
> 
> 2. Make sure the IB modules are loaded:
> /sbin/lsmod | grep ib_
> should show ib_mthca (HCA driver) as well as ib_ipoib. There are others
> but those are the two which need to be loaded and any others will
> follow. 
> 
> 3. Make sure there are no errors in /var/log/messages pertaining to ib_.
> 
> 4. Indicate the IP configuration via
> /sbin/ifconfig -a
> 
> 
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list