[ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free

Al Chu chu11 at llnl.gov
Thu Nov 6 09:23:56 PST 2008


On Thu, 2008-11-06 at 08:04 -0800, akepner at sgi.com wrote:
> On Thu, Nov 06, 2008 at 10:40:32AM +0200, Eli Cohen wrote:
> > On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote:
> > ...
> > looking a the patch I don't understand why it should fix the problem
> > you're seeing. I suspect we may be hiding the problem.
> > 
> 
> I think that may be correct. 
>
> For the stale skb pointers to be reused by the ipoib driver, it 
> looks like we'd need to get 'unexpected' completions. 

I implemented the attached cheapo-debug-patch and installed it on one of
our clusters.  We hit the error condition (the "Oh crap" error message)
several times right before the same crashes.  So I think Arthur's patch
fixes something, although there may be a more core underlying issue yet
to be solved.

Al

P.S.  I should note that when debugging this, I was looking at a
different stack trace than Arthur and Ira, but believed it to be the
same core issue.

-- 
Albert Chu
chu11 at llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: verify_skb_reset.patch
Type: text/x-patch
Size: 3196 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20081106/b093e4d5/attachment.bin>


More information about the general mailing list