[ofw] RE: NetworkDirect over WinVerbs

Tue Feb 10 17:08:43 PST 2009

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> bounces at lists.openfabrics.org] On Behalf Of Sean Hefty
> Sent: Tuesday, February 10, 2009 4:56 PM
> To: Fab Tillier; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: NetworkDirect over WinVerbs
>
>>> The error code can change, but the time out of the DREQ does indicate
>>> that a disconnect message was not received from the remote side.
>>  TIMEOUT to me means try again, you might have better luck.  I think in
>> this case, a DREQ timeout means the other side is toast, and is really
>> more like a successful disconnect.
>  To me, if the operation includes retries, timeout indicates a failure.

Yes, but a failure that is recoverable - you can try again.  So can you try again in this case?  <grumble>The STATUS_TIMEOUT value is actually a success value...</grumble>

> This is not the same as a successful disconnect, since you can't
> determine what has failed.  It could be a switch or router that's
> causing the problem and not the remote system.

Does it matter what failed?  I guess my question is, if DREQ fails, are you left in the connected state, or are you in a disconnected state?  I think a TIMEOUT return would indicate no state change.

>> In the case where you have both sides handshaking to disconnect, you
>> *will* run into issues where one side receives the last message,
>> processes it, and moves the QP to error before the sender's HW has
>> received the ACK from the receiver's HW.  When you move the QP to
>> error, the HW stops generating ACKs/retries/etc for that QP, leaving
>> the sender to timeout (which it eventually does) and the send completes
>> in error (retry exceeded.)
>>
>> Not at all what an app would expect, but that's what the HW does.
>>
>> So the QP transition to error needs to happen after the IB-level
>> disconnection happens. Sure, you could add an arbitrary delay and hope
>> things would work, but the disconnect handshake at the HW level allows
>> things to tear down politely - the client on each side can be expected
>> to not call Disconnect until all sends have completed locally.  This
>> can delay the DREP (and thus the QP transition at the sender) properly.
>  See the 2-army problem.  How would you expect this to operate over TCP,
> where disconnect messages are exchanged in-band?

TCP supports half-closed sockets, so the senders on each side can close the sending side once all sends have been sent, without impacting receives.

In any case, moving the QP to error right away on the receiver causes problems at the sender.  When I saw this it happened reliably, even in smaller configurations.  How do you suggest the sender handle this?

-Fab