[openib-general] [RFC] [PATCH V2 0/3] bonding support foroperation over IPoIB

Carl Yang (caryang) caryang at cisco.com
Thu Dec 7 14:27:31 PST 2006


Or,

Can you please forward me (or to the email alias) "an example bonding
sysfs script which can be used to set bonding to work with patches 1-3?"

Thanks,
Carl


-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Or Gerlitz
Sent: Thursday, November 30, 2006 2:57 AM
To: netdev at vger.kernel.org
Cc: Roland Dreier (rdreier); Jay Vosburgh; openib-general at openib.org
Subject: [openib-general] [RFC] [PATCH V2 0/3] bonding support
foroperation over IPoIB

This patch series is a second version (see below link to V1) of the
suggested changes to the bonding driver such that it would be able to
support non ARPHRD_ETHER netdevices for its High-Availability
(active-backup) mode.

The motivation is to enable the bonding driver on its HA mode to work
with the IP over Infiniband (IPoIB) driver. With these patches I was
able to enslave IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast
and ICMP traffic with fail-over and fail-back working fine. My working
env was the net-2.6.20 git.

More over, as IPoIB is also the IB ARP provider for the RDMA CM driver
which is used by native IB ULPs whose addressing scheme is based on IP
(eg iSER, SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices
**enables** HA for these ULPs. This holds as when the ULP is informed by
the IB HW on the failure of the current IB connection, it just need to
reconnect, where the bonding device will now issue the IB ARP over the
active IPoIB slave.

The first patch changes some of the bond netdevice attributes and
functions to be that of the active slave for the case of the enslaved
device not being of ARPHRD_ETHER type. Basically it overrides those
setting done by ether_setup(), which are netdevice **type** dependent
and hence might be not appropriate for devices of other types. It also
enforces mutual exclusion on bonding slaves from dissimilar ether types,
as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a
3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID)
of the port this IPoIB device is bounded to. The QP is a resource
created by the IB HW and the GID is an identifier burned into the HCA (i
have omitted here some details which are not important for the bonding
RFC).

Basically the IPoIB spec and impl. do not allow for setting the MAC
address of an IPoIB device and this work was made under this assumption.

Hence, the second patch allows for enslaving netdevices which do not
support the set_mac_address() function. In that case the bond mac
address is the one of the active slave, where remote peers are notified
on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over
occurs (this is already done by the bonding code).

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some
multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup()
have set the bonding device type to be ARPHRD_ETHER and address len to
be ETHER_ALEN, the net core code computes a wrong multicast link
address. This is b/c ip_eth_mc_map() is called where for mcast joins
taking place **after** the enslavement another ip_xxx_mc_map() is called
(eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

The third patch handles this problem by allowing to enslave devices when
the bonding device is not up. Over the discussion held at the previous
post this seemed to be the most clean way to go, where it is not
expected to cause instabilities.

These patches are not enough for configuration of IPoIB bonding through
tools (eg /sbin/ifenslave and /sbin/ifup) provided by packages such as
sysconfig and initscripts, specifically since these tools sets the
bonding device to be UP before enslaving anything. Once this patchset
gets positive/feedback the next step would be to look how to enhance the
tools/packages so it would be possible to bond/enslave with the modified
code. As suggested by the bonding maintainer, this step can potentially
involve converting ifenslave to be a script based on the bonding sysfs
infrastructure rather on the somehow obsoleted
Documentation/networking/ifenslave.c

For the ease of potential testers, I will post an example bonding sysfs
script which can be used to set bonding to work with patches 1-3 (let me
know!)

Or.

changes from V1 (the links point to V1 0-3/3)

http://marc.theaimsgroup.com/?l=linux-netdev&m=115926582209736&w=2
http://marc.theaimsgroup.com/?l=linux-netdev&m=115926599515568&w=2
http://marc.theaimsgroup.com/?l=linux-netdev&m=115926599430055&w=2
http://marc.theaimsgroup.com/?l=linux-netdev&m=115926599415729&w=2

+ enforce mutual exclusion on the slaves ether types don't attempt to 
+ set the bond mtu when enslaving a non ARPHRD_ETHER device rather than 
+ hack the bond device ether type through mod params allow enslavement
  when the bond device is not up

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list