[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

Helen Chen hycsw at ca.sandia.gov
Thu Oct 13 14:21:16 PDT 2005


Roland,

>From rolandd at cisco.com Thu Oct 13 13:53:05 2005
>
>    Helen> Roland, Thank you for your response.  That fixed my initial
>    Helen> buffer allocation failure.  After we tuned the Lustre and
>    Helen> reran same IOZONE tests again, we got the following
>    Helen> problem.  Was there an actual network interrupt? If so, the
>    Helen> problem is not obvious now; the two nodes are pinging over
>    Helen> IPoIB.  Please advice.
>
>That's very odd.  This message:
>
>    Helen> NETDEV WATCHDOG: ib0: transmit timed out
>    Helen> ib0: transmit timeout: latency 1846
>
>says that we are not seeing send completions from the HCA.  However,
>are you saying that even when you are seeing this message, ping over
>IPoIB is working?
>

No, I didn't know there were any problem until IOZONE reported read 
error from the Lustre Client.  

BTW, the backend storage is iSCSI over 10 GbE using jumbo frame.  This
pl\roblem only appeared after our tuning errfor: we increased the iSCSI
payload to 1 MB, and increased the TCP window to 512 KB from 256 KB. I
will shrink my TCP window and see if the problem goes away.

Thanks,
Helen



More information about the general mailing list