[ofw] ndrping issues
Fab Tillier
ftillier at microsoft.com
Tue Jan 5 12:37:59 PST 2010
Sean Hefty wrote on Tue, 5 Jan 2010 at 12:21:08
> Here are some more details on the ndrping issues that I mentioned
> during the
> con-call this morning. (Btw, the test fails with IBAL as well.)
>
> The test fails if the client runs on mthca, and the server runs on mlx4.
>
> The client code calls CreateEndpoint() using values of 0 for inbound and
> outbound RDMA reads. The 0's get mapped into the CM REQ messages.
I'll fix the test to use the CA's maximum for outbound read limit on the client side if the test will use reads.
> The server calls CreateEndpoint() using the local HW max values. These
> end up getting pushed into the CM REP message.
I'll fix the test to set outbound read limit to zero on the server, and cap the inbound read limit to the minimum of requested and the local CA's capabilities.
> Both the client and server QPs get configured using the value carried
> in the REP. I'm not sure where the best place is for a fix.
That's probably the right thing to do, since the REP has the final value. The CM checks when sending the REP that the responder resources don't exceed the capabilities of the HCA. It may be worth adding a check in the CM to ensure that section 12.7.29 of the IB spec (Responder Resources) isn't violated - the initiator depth should be <= to the offered responder resources. A failure here would be fine rather than silently capping things, especially if responder resources offered in the REQ is 0.
> ndrping must at least be changed to pass in non-zero values (I think in
> place of InboundReadLimit) on the client side. It should probably be
> changed on the server side to call GetConnectionData() to check the
> provided values and select the minimum.
I'd think I'd like to see GetConnectionData cap the inbound read limit to what the HCA can support, and CreateEndpoint check that the provided OutboundReadLimit doesn't exceed what was offered. I'll fix the test to not be busted, though.
> Also, there seems to be a bug in the mthca driver that allows the QP to
> be configured with values that are too large, which results in the QP
> being unable to send data. Note that I'm running the test using RDMA
> writes, not reads, and the first send from the client is what never
> completes.
I would have expected the QP modify to fail in this case. Odd that it succeeded.
-Fab
More information about the ofw
mailing list