[openib-general] IPoIB still not working [was IPoIB FAQ Update]
Josh England
jjengla at sandia.gov
Sun Dec 5 20:02:56 PST 2004
I'm following the FAQ pretty closely and IPoIB is still not working for
me. I'm using the latest from SVN (as of tonight). These are PCIe HCAs
with 4.5.3 firmware (x86_64 MST is horribly broken BTW --can't wait for
tvflash). Hardware/cabling is fine since things work under VAPI 3.2.
All modules are loaded, and syslog doesn't show anything. It seems now
(new with 4.5.3 FW) that packets are flowing one direction but not the
other. Here's as much debug info as I could dig up.
ping from n0 to n1:
/sys/class/net/ib0/statistics/rx_packets on n1 increases
ping from n1 to n0:
/sys/class/net/ib0/statistics/rx_packets remains at 0
tcpdump shows nothing on either node
n0:/ipoib_debugfs/ib0_mcg shows:
GID: ff12:401b:ffff:0:0:0:0:1 created: 4295359818
queuelen: 0 complete: 1 send_only: 0
GID: ff12:401b:ffff:0:0:0:ffff:ffff created: 4295359818
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:1 created: 4295359819
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:2 created: 4295361295
queuelen: 2 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:0:0:16 created: 4295359821
queuelen: 0 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:1:ff00:6693 created: 4295359819
queuelen: 0 complete: 1 send_only: 0
n1:/ipoib_debugfs/ib0_mcg shows:
GID: ff12:401b:ffff:0:0:0:0:1 created: 4294846326
queuelen: 0 complete: 1 send_only: 0
GID: ff12:401b:ffff:0:0:0:ffff:ffff created: 4294846326
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:1 created: 4294846327
queuelen: 0 complete: 1 send_only: 0
GID: ff12:601b:ffff:0:0:0:0:2 created: 4294847749
queuelen: 2 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:0:0:16 created: 4294846329
queuelen: 0 complete: 0 send_only: 1
GID: ff12:601b:ffff:0:0:1:ff00:66bf created: 4294846327
queuelen: 0 complete: 1 send_only: 0
ifconfig -a on n0:
eth0 Link encap:Ethernet HWaddr 00:30:48:53:69:D8
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:41000 errors:0 dropped:0 overruns:0 frame:0
TX packets:32882 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:21739542 (20.7 Mb) TX bytes:4794259 (4.5 Mb)
Base address:0xbc00 Memory:feae0000-feb00000
ib0 Link encap:UNSPEC HWaddr
00-00-00-84-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.1 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::206:6a00:a000:6693/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:0 (0.0 b) TX bytes:2768 (2.7 Kb)
ib1 Link encap:UNSPEC HWaddr
00-00-00-85-FE-80-00-00-00-00-00-00-00-00-00-00
BROADCAST MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:70 errors:0 dropped:0 overruns:0 frame:0
TX packets:70 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6748 (6.5 Kb) TX bytes:6748 (6.5 Kb)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
ifconfig -a on n1:
eth0 Link encap:Ethernet HWaddr 00:30:48:53:6A:77
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:41747 errors:0 dropped:0 overruns:0 frame:0
TX packets:33587 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:21841556 (20.8 Mb) TX bytes:4897945 (4.6 Mb)
Base address:0xbc00 Memory:feaa0000-feac0000
ib0 Link encap:UNSPEC HWaddr
00-00-00-84-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.2 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::206:6a00:a000:66bf/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:45 errors:0 dropped:0 overruns:0 frame:0
TX packets:69 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:2520 (2.4 Kb) TX bytes:4372 (4.2 Kb)
ib1 Link encap:UNSPEC HWaddr
00-00-00-85-FE-80-00-00-00-00-00-00-00-00-00-00
BROADCAST MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:128
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:36 errors:0 dropped:0 overruns:0 frame:0
TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2940 (2.8 Kb) TX bytes:2940 (2.8 Kb)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Is there anything else I can try?
-JE
On Fri, 2004-12-03 at 14:43 -0500, Hal Rosenstock wrote:
> Here's an update to my initial attempt at an IPoIB FAQ:
>
> ping doesn't work between IPoIB nodes. What should I do ?
>
> First, verify that the ports are active.
>
> This can be done via:
>
> cat /sys/class/infiniband/mthca0/ports/1/state
>
> This should indicate 4: ACTIVE
>
> assuming the HCA is mthca0 and port 1 is the one plugged into the subnet
> (switch, etc.).
>
> If the port is not active, there could be several reasons:
>
> 1. You need an SM in your subnet to bring the ports to active. Do you
> have an SM ? This can be embedded in a switch or some other IB hardware
> or run on an end node (HCA) although OpenIB (gen2) does not currently
> support this.
>
> 2. If you have an SM in your subnet, there might be a cabling problem
> where the SM cannot "reach" your end node.
>
> If the port is active, indicate the subnet configuration and which SM is
> being utilized.
>
> Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0"
> show anything on the other nodes when you try to ping or something?
>
> There are 2 levels of IPoIB debug which can be enabled when building:
> IP-over-InfiniBand debugging and IP-over-InfiniBand data path debugging.
> The latter has performance implications and should only be enabled when
> all else fails. Enable the first level of IPoIB debug and then:
>
> mount -t ipoib_debugfs none /ipoib_debufs/
> cat /ipoib_debugfs/ib0_mcg
>
> Other things to verify and supply to help isolate the problem:
>
> 1. Verify the firmware version via
>
> cat /sys/class/infiniband/mthca0/fw_ver
>
> For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version
> 4.5.3 is recommended.
>
> 2. Make sure the IB modules are loaded:
> /sbin/lsmod | grep ib_
> should show ib_mthca (HCA driver) as well as ib_ipoib. There are others
> but those are the two which need to be loaded and any others will
> follow.
>
> 3. Make sure there are no errors in /var/log/messages pertaining to ib_.
>
> 4. Indicate the IP configuration via
> /sbin/ifconfig -a
>
>
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list