[openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB

Moni Shoua monis at voltaire.com
Tue Feb 27 03:54:59 PST 2007


Thanks for the comments 

>> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
>> instead of the struct neighbour dev one.
> 
> It seems that in this design, if multiple ipoib interfaces are present, we might
> get an skb such that skb->dev will be different from the new dev field in struct
> ipoib_neigh.
> 
> It seems that the result will be that the packet will be sent on a wrong interface.
> Right?
> 
I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection.
As I see it, skb travels this path:
1. Passed to bond_dev->hard_start_xmit
2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig.

>> In addition, if an IPoIB device is removed before bonding is unloaded it may 
>> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
>> device no longer exist. This is why a neighbour cleanup is required during device 
>> cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
>> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
>> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
>> ib_ipoib.
>> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.
> 
> I wonder about this: is it really true that any allocated neighbour is always in
> either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
> and retained a neighbour that is not in either one of these tables?
> 
I got the assumption about neighbours living in one of these 2 tables from observation and code reading.
I preferred that that on keeping track of all ipoib_neighs and putting them in a list. However, I could 
do that instead of neigh_table scanning. Do you think it's better?
For the example... I didn't understand it. Could you please explain?

>> During my tests I found that when running 
>>
>> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
>> 	2. ping somewhere on the subnet of bond0
>>
>> I get this stack dump (which ends with kernel death)
>> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
>> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
>> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
>> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
>> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
>> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
>> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
>> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
>> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
>> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
>> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
>> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
>> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
>> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
>> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
>> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
>> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
>>
>> The only way I found to avoid this (for now) is to check skb headroom in
>> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
>> operation and it seems to solve my problem. However, I would be happy to hear what
>> others think of this last issue.
> 
> As I said, this seems to indicate a problem in the bonding code.
> But what will happen after you error out in ipoib_hard_header?
> Is the packet dropped? What might break as a result?
> 
I will check the hard_header_len issue in the bonding code more carefully. From first look
it seems that bonding does borrow the hard_header_len.
Also, my checks show that it is safe to return with error from hard_header().
For example,  in neigh_connected_output:

        err = dev->hard_header(skb, dev, ntohs(skb->protocol),
                               neigh->ha, NULL, skb->len);
        read_unlock_bh(&neigh->lock);
        if (err >= 0)
                err = neigh->ops->queue_xmit(skb);
        else {
                err = -EINVAL;
                kfree_skb(skb);
 
>> I would really appreciate comments.
>>
>> thanks
>>
>>  -MoniS
> 






More information about the general mailing list