[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Fri Nov 2 19:07:46 PDT 2007


> 
> OTOH it is quite possible that ipoib is corrupting an skb somehow so
> that when it gets reused by e1000, you see a crash.  The fact that you
> were running netperf on IB when e1000 crashed is somewhat suspicious.

Yes, exactly the lingering suspicions that I had. I ran several iterations 
of neteperf on e1000 and there were no crashes. So, I started looking at the
patch more closely. I think I am on to something now.

In ipoib_cm_handle_rx_wc() I see two things (I have not yet looked at the 
latest changes that you mentioned earlier today) :

1. Do not understand the usage and purpose of recv_count (something new that
you have introduced). Can you please explain. However, the suspicion being 
that if somehow the if clause is executed, the rx_ring gets freed and so 
all the skb pointers are bogus. I have commented out this segment of code.

2. The call to ipoib_cm_alloc_rx_skb() in ipoib_cm_handle_rx_wc() uses an
index value of 0 (hard coded) which is incorrect for no srq. I have changed
that to index instead.

I have been running this for some hours now; no crashes and no errors. This is 
using Slub. If I get a chance I will run with slab over the weekend and let you 
know of the results.

Pradeep




More information about the general mailing list