[ofa-general] [PATCH] IPoIB: keep ib_wc[] on stack

Wed May 14 00:25:48 PDT 2008

On Tue, 2008-05-13 at 18:21 -0700, akepner at sgi.com wrote:
> We're getting panics like this one on big clusters:
> 
> skb_over_panic: text:ffffffff8821f32e len:160 put:100 head:ffff810372b0f000 data:ffff810372b0f01c tail:ffff810372b0f0bc end:ffff810372b0f080 dev:ib0

RX SKBs are large enough to contain 100 bytes... this looks like
corruption. Can you give more information on OS, kernel version, OFED
version.

> 
> Started looking into what might cause this and I found that IPoIB 
> always does something like this:
> 
> int ipoib_poll(struct net_device *dev, int *budget) 
> { 
> 	struct ipoib_dev_priv *priv = netdev_priv(dev);
> 	....
> 	ib_poll_cq(priv->rcq, t, priv->ibwc);
> 
> 	for (i = 0; i < n; i++) {
> 		struct ib_wc *wc = priv->ibwc + i;
> 		....
> 		ipoib_ib_handle_rx_wc(dev, wc);
> 		
> 
> What happens if we call ib_poll_cq() then, before processing the 
> rx completions in ipoib_ib_handle_rx_wc(), ipoib_poll() gets called 
> again (on a different CPU)? That could corrupt the priv->ibwc array, 
> and lead to a panic like above. 

>From NAPI_HOWTO.txt, although the file has been removed but I think the
statement is still valid:

-Guarantee: Only one CPU at any time can call dev->poll(); this is
because only one CPU can pick the initial interrupt and hence the
initial netif_rx_schedule(dev);

> 
> How about keeping the array of struct ib_wc on the stack? 
The stack is limited for kernel code and putting this on the stack is
limiting. I think this could hurt performance too due to more cache
misses.