[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

Helen Chen hycsw at ca.sandia.gov
Thu Oct 13 16:38:12 PDT 2005


Roland,

>From rolandd at cisco.com Thu Oct 13 16:19:30 2005
>
>    Helen> BTW, the state of the IPoIB network seemed fine after the
>    Helen> failed test, nd the mthca counters are moving up nicely.
>
>Even on the server on3-ib?

Yes, even on the server on3-ib.

>
>    Helen> Do you still think this is a crash of the HCA firmware?
>    Helen> Should I call Mellanox?
>
>Not if IPoIB is working on the systems printing the TX time out
>messages.  However, if everything stops working on one of your
>systems, then yes, an HCA crash is likely.
>
>I'm still a unclear on what is happening.  Do you see TX time
>out messages on a particular server, but IPoIB and mthca counters
>still work fine on that same server?  Or is it just the rest of the
>fabric that continues working?
>

Not in realtime.  My observations were made after the fact.  I supose 
I can launch another test and watch the cunter in realtime if you
believe that is necessary?

>Thanks,
>  Roland

Thank you so much for the speedy fix.  I will apply the patch and 
stress test it as soon as possible.

Helen :-)




More information about the general mailing list