[openib-general] IPoIB configuration
Hal Rosenstock
halr at voltaire.com
Thu Sep 29 12:25:43 PDT 2005
On Thu, 2005-09-29 at 15:11, Thomas Moschny wrote:
> On Thursday 29 September 2005 20:32, you wrote:
> > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if
> > the ib0 is 192.168.0.x) ?
>
> The only answer I get is from the sender itself:
>
> $ ping -b 192.168.204.255
> WARNING: pinging broadcast address
> PING 192.168.204.255 (192.168.204.255) 56(84) bytes of data.
> 64 bytes from 192.168.204.1: icmp_seq=0 ttl=64 time=0.029 ms
> 64 bytes from 192.168.204.1: icmp_seq=1 ttl=64 time=0.012 ms
> 64 bytes from 192.168.204.1: icmp_seq=2 ttl=64 time=0.012 ms
>
> --- 192.168.204.255 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.012/0.017/0.029/0.009 ms, pipe 2
>
> > Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0" show
> > anything on the other nodes when you try to ping or something ?
>
> No, rx_packages contains '0' on all nodes. And tcpdump doesn't work:
>
> $ tcpdump -i ib0
> tcpdump: ioctl: Value too large for defined data type
>
> > Are there any messages in /var/log/messages pertaining to ib_ ?
>
> Only those mentioned earlier:
> - divert: not allocating divert_blk for non-ethernet device ib0 [and ib1]
> - ib0: no IPv6 routers present
>
> > Also, are there any errors in the OpenSM log ? Can you look there ?
> > Perhaps rerun OpenSM with -V and send the log.
>
> Ok, I did. The log is rather big, and there are some warnings, but no errors.
> For what should I look? Anyway, I attach a compressed version to this mail.
>
> > Please consult http://www.openib.org/docs/ipoib_faq.txt for more info.
>
> I already did, but nevertheless have currently no clue what's the problem.
>
> The only 'anomaly' I noticed was that ifconfig doesn't show the hardware
> address *at all*, even not the higher bytes:
>
> $ ifconfig ib0
> ib0 Link encap:UNSPEC HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> inet addr:192.168.204.2 Bcast:192.168.204.255 Mask:255.255.255.0
> inet6 addr: fe80::202:c902:0:1575/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:128
> RX bytes:0 (0.0 b) TX bytes:360 (360.0 b)
>
> (but ip does, so I didn't worry:
> $ ip ib0
> ip addr show dev ib0
> 5: ib0: <BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast qlen 128
> link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:15:75 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> inet 192.168.204.2/24 brd 192.168.204.255 scope global ib0
> inet6 fe80::202:c902:0:1575/64 scope link
> valid_lft forever preferred_lft forever)
>
> More seriously, the arp table is empty:
> $ ip neigh show dev ib0
> doesn't show anything.
because no other nodes are seen with ping.
In the log, I do see several nodes successfully join the IPoIB broadcast
group and the multicast tree for this got setup (I didn't actually
validate the tree itself).
PortGid.................0xfe80000000000000 : 0x0002c90200001575
PortGid.................0xfe80000000000000 : 0x0002c90200001581
What is your switch configuration ? It looks like a single switch. Is
that right ?
Bottom line, I still can't see anything wrong. You may need to turn on
IPoIB multicast debug in the host stack to see what is not working
correctly.
--- Hal
More information about the general
mailing list