[openib-general] IPoIB configuration

Hal Rosenstock halr at voltaire.com
Thu Sep 29 12:25:43 PDT 2005


On Thu, 2005-09-29 at 15:11, Thomas Moschny wrote:
> On Thursday 29 September 2005 20:32, you wrote:
> > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if
> > the ib0 is 192.168.0.x) ?
> 
> The only answer I get is from the sender itself:
> 
> $ ping  -b 192.168.204.255
> WARNING: pinging broadcast address
> PING 192.168.204.255 (192.168.204.255) 56(84) bytes of data.
> 64 bytes from 192.168.204.1: icmp_seq=0 ttl=64 time=0.029 ms
> 64 bytes from 192.168.204.1: icmp_seq=1 ttl=64 time=0.012 ms
> 64 bytes from 192.168.204.1: icmp_seq=2 ttl=64 time=0.012 ms
> 
> --- 192.168.204.255 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.012/0.017/0.029/0.009 ms, pipe 2
> 
> > Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0" show
> > anything on the other nodes when you try to ping or something ?
> 
> No, rx_packages contains '0' on all nodes. And tcpdump doesn't work:
> 
> $ tcpdump -i ib0
> tcpdump: ioctl: Value too large for defined data type
> 
> > Are there any messages in /var/log/messages pertaining to ib_ ?
> 
> Only those mentioned earlier:
> - divert: not allocating divert_blk for non-ethernet device ib0 [and ib1]
> - ib0: no IPv6 routers present
> 
> > Also, are there any errors in the OpenSM log ? Can you look there ?
> > Perhaps rerun OpenSM with -V and send the log.
> 
> Ok, I did. The log is rather big, and there are some warnings, but no errors. 
> For what should I look? Anyway, I attach a compressed version to this mail.
> 
> > Please consult http://www.openib.org/docs/ipoib_faq.txt for more info.
> 
> I already did, but nevertheless have currently no clue what's the problem.
> 
> The only 'anomaly' I noticed was that ifconfig doesn't show the hardware 
> address *at all*, even not the higher bytes:
> 
> $ ifconfig ib0
> ib0       Link encap:UNSPEC  HWaddr 
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           inet addr:192.168.204.2  Bcast:192.168.204.255  Mask:255.255.255.0
>           inet6 addr: fe80::202:c902:0:1575/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:128
>           RX bytes:0 (0.0 b)  TX bytes:360 (360.0 b)
> 
> (but ip does, so I didn't worry:
> $ ip ib0
> ip addr show dev ib0
> 5: ib0: <BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast qlen 128
>     link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:15:75 brd 
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 192.168.204.2/24 brd 192.168.204.255 scope global ib0
>     inet6 fe80::202:c902:0:1575/64 scope link
>        valid_lft forever preferred_lft forever)
> 
> More seriously, the arp table is empty:
> $ ip neigh show dev ib0
> doesn't show anything.

because no other nodes are seen with ping.

In the log, I do see several nodes successfully join the IPoIB broadcast
group and the multicast tree for this got setup (I didn't actually
validate the tree itself).
                              
PortGid.................0xfe80000000000000 : 0x0002c90200001575
                              
PortGid.................0xfe80000000000000 : 0x0002c90200001581

What is your switch configuration ? It looks like a single switch. Is
that right ?

Bottom line, I still can't see anything wrong. You may need to turn on
IPoIB multicast debug in the host stack to see what is not working
correctly.

--- Hal




More information about the general mailing list