[ofa-general] [PATCH] ipoib: defer skb_orphan() until irqs enabled

Eli Cohen eli at dev.mellanox.co.il
Thu Sep 11 04:37:46 PDT 2008


On Wed, Sep 10, 2008 at 11:26:16AM -0700, Roland Dreier wrote:
> On Tue, Sep 09, 2008 at 02:32:44PM -0700, Roland Dreier wrote:
> > By the way, looking at this stuff again, it seems we have (a possibly
> > quite unlikely) race where a send can complete before the xmit method
> > finishes, and we end up running skb_orphan on an skb that another
> > context has already freed.  I'll have to think about how we can fix
> > that -- but any good ideas are appreciated...
> 
> Actually it looks like Arthur's patch introduces this race.  The current
> code is OK because skb_orphan is called under tx_lock, which is also
> held when we poll the send CQ.  But of course the status quo is no good
> exactly because of the locking issue Arthur found.
> 
>  > We can check if there are outstanding WRs after poll_tx is called. If
>  > there are no outstanding WRs, it means that the SKB has been freed. If
>  > there are outstanding WRs, it means that the last post has not been
>  > freed so we can call skb_orphan(). Like the following patch (on top of
>  > Arthur's):
> 
> I don't think this closes the race completely: at the point skb_orphan
> is called (after Arthur's patch, by design), we have no locks held.  And
> so the timer-driven send completion handling could already have run and
> freed the skb between when we drop tx_lock and when we call skb_orphan.
> 
I don't think there is a problem. The only SKB which is subject to
this race, is the one we that we posted right after stopping the net
queue. But the interrupt handler (resulting from arming the CQ) and
possibly the following timer invocations, will drain the CQ up to the
point where there are half the queue outstanding WRs. But and this one
is at the other half of the queue.



More information about the general mailing list