[ofa-general] Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

David Miller davem at davemloft.net
Wed Sep 19 09:05:57 PDT 2007


From: Krishna Kumar <krkumar2 at in.ibm.com>
Date: Wed, 19 Sep 2007 17:24:03 +0530

> Note: during steps F-H and C-E, priv/napi is read/modified by both cpu's
> 	which is another bug relating to the same race.
> 
> I guess the above patch is not required if this bug (in IPoIB) is fixed?

The NAPI_STATE_SCHED flag bit should provide all of the necessary
synchornization.

Only the setter of that bit should add the NAPI instance to the
polling list.

The polling loop runs atomically on the cpu where the NAPI instance
got added to the per-cpu polling list.  And therefore decisions to
complete NAPI are serialized too.

That serialized completion decision is also when the list deletion
occurs.

I'm starting to suspect the whole problem comes from the resched
facility, and now I really don't blame Stephen for trying to delete
it.  Semantically it really makes things very difficult, especially
wrt. to the atomicity of the list handling.



More information about the general mailing list