[openib-general] IPoIB FAQ Update
Hal Rosenstock
halr at voltaire.com
Fri Dec 3 11:43:20 PST 2004
Here's an update to my initial attempt at an IPoIB FAQ:
ping doesn't work between IPoIB nodes. What should I do ?
First, verify that the ports are active.
This can be done via:
cat /sys/class/infiniband/mthca0/ports/1/state
This should indicate 4: ACTIVE
assuming the HCA is mthca0 and port 1 is the one plugged into the subnet
(switch, etc.).
If the port is not active, there could be several reasons:
1. You need an SM in your subnet to bring the ports to active. Do you
have an SM ? This can be embedded in a switch or some other IB hardware
or run on an end node (HCA) although OpenIB (gen2) does not currently
support this.
2. If you have an SM in your subnet, there might be a cabling problem
where the SM cannot "reach" your end node.
If the port is active, indicate the subnet configuration and which SM is
being utilized.
Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0"
show anything on the other nodes when you try to ping or something?
There are 2 levels of IPoIB debug which can be enabled when building:
IP-over-InfiniBand debugging and IP-over-InfiniBand data path debugging.
The latter has performance implications and should only be enabled when
all else fails. Enable the first level of IPoIB debug and then:
mount -t ipoib_debugfs none /ipoib_debufs/
cat /ipoib_debugfs/ib0_mcg
Other things to verify and supply to help isolate the problem:
1. Verify the firmware version via
cat /sys/class/infiniband/mthca0/fw_ver
For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version
4.5.3 is recommended.
2. Make sure the IB modules are loaded:
/sbin/lsmod | grep ib_
should show ib_mthca (HCA driver) as well as ib_ipoib. There are others
but those are the two which need to be loaded and any others will
follow.
3. Make sure there are no errors in /var/log/messages pertaining to ib_.
4. Indicate the IP configuration via
/sbin/ifconfig -a
More information about the general
mailing list