[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs
Pradeep Satyanarayana
pradeeps at linux.vnet.ibm.com
Fri Nov 2 19:07:46 PDT 2007
>
> OTOH it is quite possible that ipoib is corrupting an skb somehow so
> that when it gets reused by e1000, you see a crash. The fact that you
> were running netperf on IB when e1000 crashed is somewhat suspicious.
Yes, exactly the lingering suspicions that I had. I ran several iterations
of neteperf on e1000 and there were no crashes. So, I started looking at the
patch more closely. I think I am on to something now.
In ipoib_cm_handle_rx_wc() I see two things (I have not yet looked at the
latest changes that you mentioned earlier today) :
1. Do not understand the usage and purpose of recv_count (something new that
you have introduced). Can you please explain. However, the suspicion being
that if somehow the if clause is executed, the rx_ring gets freed and so
all the skb pointers are bogus. I have commented out this segment of code.
2. The call to ipoib_cm_alloc_rx_skb() in ipoib_cm_handle_rx_wc() uses an
index value of 0 (hard coded) which is incorrect for no srq. I have changed
that to index instead.
I have been running this for some hours now; no crashes and no errors. This is
using Slub. If I get a chance I will run with slab over the weekend and let you
know of the results.
Pradeep
More information about the general
mailing list