[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer
Helen Chen
hycsw at ca.sandia.gov
Thu Oct 13 16:15:38 PDT 2005
Roland,
Ci
So you are right, it is not a moving target. After repeating
the IOZONE tests several times, I narrowed down the culprit,
server on3-ib. Parallel I/O had made it a bit difficult to
chase it down :-(
BTW, the state of the IPoIB network seemed fine after the failed
test, nd the mthca counters are moving up nicely. Do you still
think this is a crash of the HCA firmware? Should I call Mellanox?
Thanks,
Helen
---------- Original Message -----------------
>From rolandd at cisco.com Thu Oct 13 15:13:16 2005
>
> Helen> It doesn't seem like shrinking the TCP window had helped.
> Helen> I captured the Dmesg log from Lustre server and associated
> Helen> client reporting IOZONE error.
>
>What is the state of the system after you start seeing the ib0
>transmit time out messages? Does IPoIB work at all? Is the HCA
>responsive at all -- for example what do you see if you do
>
> cat /sys/class/infiniband/mthca0/ports/1/state
>
>or
>
> cat /sys/class/infiniband/mthca0/ports/1/counters/*
>
> Helen> BTW, this problem is a moving target so it is hard to
> Helen> believe that it is hardware related(?) BTW, I am using the
> Helen> mellanox DDR switch and HCA.
>
>Not sure what you mean by a moving target... the symptoms really look
>like a crash of the HCA firmware to me.
>
>Thanks,
> Roland
>
More information about the general
mailing list