[openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB

Michael S. Tsirkin mst at mellanox.co.il
Tue Feb 27 06:51:14 PST 2007


> Quoting Moni Shoua <monis at voltaire.com>:
> Subject: Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB
> 
> 
> Thanks for the comments 
> 
> >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> >> instead of the struct neighbour dev one.
> > 
> > It seems that in this design, if multiple ipoib interfaces are present, we might
> > get an skb such that skb->dev will be different from the new dev field in struct
> > ipoib_neigh.
> > 
> > It seems that the result will be that the packet will be sent on a wrong interface.
> > Right?
> > 
> I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection.
> As I see it, skb travels this path:
> 1. Passed to bond_dev->hard_start_xmit
> 2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig.

ipoib_neigh ah field includes struct ib_ah *.
This selects important parameters which depend on both packet source and
destination interfaces.

I think the right thing might be to compare ipoib_neigh dev and skb->dev,
and destroy ipoib_neigh if these do not match.

> >> In addition, if an IPoIB device is removed before bonding is unloaded it may 
> >> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
> >> device no longer exist. This is why a neighbour cleanup is required during device 
> >> cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
> >> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
> >> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
> >> ib_ipoib.
> >> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.
> > 
> > I wonder about this: is it really true that any allocated neighbour is always in
> > either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
> > and retained a neighbour that is not in either one of these tables?
> > 
> I got the assumption about neighbours living in one of these 2 tables from
> observation and code reading.  I preferred that that on keeping track of all
> ipoib_neighs and putting them in a list. However, I could do that instead of
> neigh_table scanning. Do you think it's better?

If some neighbours are not on any tables, it seems using our own lists
(e.g. lists we have in ipoib_path) is the only option, no?

> For the example... I didn't
> understand it. Could you please explain?

grep for neigh_hold. neighbour is only destroyed when ref count goes to 0.
If some code does neigh_hold, it seems neighbour could be removed from table
but destructor not yet called.

> >> During my tests I found that when running 
> >>
> >> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
> >> 	2. ping somewhere on the subnet of bond0
> >>
> >> I get this stack dump (which ends with kernel death)
> >> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
> >> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
> >> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
> >> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
> >> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
> >> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
> >> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
> >> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
> >> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
> >> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
> >> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
> >> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
> >> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
> >> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
> >> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
> >> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
> >> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
> >>
> >> The only way I found to avoid this (for now) is to check skb headroom in
> >> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
> >> operation and it seems to solve my problem. However, I would be happy to hear what
> >> others think of this last issue.
> > 
> > As I said, this seems to indicate a problem in the bonding code.
> > But what will happen after you error out in ipoib_hard_header?
> > Is the packet dropped? What might break as a result?
> > 
> I will check the hard_header_len issue in the bonding code more carefully.
> From first look it seems that bonding does borrow the hard_header_len.

So where does a shorter message come from?

> Also,
> my checks show that it is safe to return with error from
> hard_header().  For example,  in neigh_connected_output:
> 
>         err = dev->hard_header(skb, dev, ntohs(skb->protocol),
>                                neigh->ha, NULL, skb->len);
>         read_unlock_bh(&neigh->lock);
>         if (err >= 0)
>                 err = neigh->ops->queue_xmit(skb);
>         else {
>                 err = -EINVAL;
>                 kfree_skb(skb);
>  
> >> I would really appreciate comments.
> >>
> >> thanks
> >>
> >>  -MoniS
> > 

-- 
MST




More information about the general mailing list