[ofa-general] Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

Krishna Kumar2 krkumar2 at in.ibm.com
Wed Sep 19 22:10:33 PDT 2007


Hi Jan-Bernd,

Jan-Bernd Themann <ossthema at de.ibm.com> wrote on 09/19/2007 06:53:48 PM:

> If I understood it right the problem you describe (quota update in
> __napi_schdule) can cause further problems when you choose the
> following numbers:
>
> CPU1: A. process 99 pkts
> CPU1: B. netif_rx_complete()
> CPU2: interrupt occures, netif_rx_schedule is called, net_rx_action
triggerd:
> CPU2: C. set quota = 100 (__napi_schedule)
> CPU2: D. call poll(), process 1 pkt
> CPU2: D.2 call netif_rx_complete() (quota not exeeded)
> CPU2: E. net_rx_action: set quota=99
> CPU1: F. net_rx_action: set qutoa=99 - 99 = 0
> CPU1: G. modify list (list_move_tail) altough netif_rx_complete has been
called
>
> Step G would fail as the device is not in the list due
> to netif_rx_complete. This case can occur for all
> devices running on an SMP system where interrupts are not pinned.

I think list_move should be ok whether device is on the list or not.
But it should not come to that code since work (99) != weight (100).
If work == weight, then driver would not have done complete, and the
next/prev would not be set to POISON.

I like the clean changes made by Dave to fix this, and will test it
today (if I can get my crashed system to come up).

Thanks,

- KK




More information about the general mailing list