[openib-general] cmpost test failures
Sean Hefty
mshefty at ichips.intel.com
Mon Apr 24 11:05:37 PDT 2006
Ali Ayoub wrote:
> 1. If I change the local and the remote timeout for ib_cm_req_param to
> 40 (instead of 20, the default value) it causes kernel oops.
The timeout is calculated as: 4.096 x 2 ^ timeout. In highly technical terms,
going from 20 to 40 increases the timeout by a factor of a lot (from seconds to
weeks).
Since the oops occurred in cmpost, I'm not overly concerned with trying to debug
this at the moment. (I will happily take a patch that fixes the issue, or will
look at it more if it definitely looks like an ib_cm bug. Cmpost just isn't
meant to be a robust test program.)
> 2. With the following parameters:
>
> connections = 3000
>
> message_size = 200
>
> message_count = 10
>
> qp_type = RC
>
> The test fails inconsistently; in some cases it causes a kernel oops,
This setup will result in allocating a fair amount of memory, which could
explain the failures. The oops may be related, but I can't tell just from the
backtrace. I've never run into this myself though. Can you reproduce this
issue using a smaller number of connections?
Note that when simultaneously establishing a large number of connections, you
will end up overrunning QP 1 on the remote side. This will result in a lot of
dropped MADs, timeouts, and retries, which can make the results of the test
unpredictable.
> 3. In other cases the server fails because it receives some
> IB_CM_DREQ_ERROR when the client receives all the IB_CM_DREQ_RECEIVED.
This can occur, and is easier to reproduce for a large number of connections. A
DREQ is retried until a DREP is received. However, since a DREP is not acked,
once it has been sent, the disconnect is done from the client's perspective. If
the DREP is lost, the server will see a DREQ timeout.
There is code in the ib_cm to resend a DREP in response to a repeated DREQ, but
the state needed to generate the DREP is only maintained while the old
connection is in timewait.
- Sean
More information about the general
mailing list