[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Tue Nov 15 06:43:18 PST 2005

At 12:49 PM 11/14/2005, Nitin Hande wrote:
>Michael Krause wrote:
>>At 01:02 PM 11/11/2005, Ranjit Pandit wrote:
>>
>>>On 11/11/05, Michael Krause <krause at cup.hp.com> wrote:
>>> > Please clarify the following which was in the document provided by 
>>> Oracle.
>>> >
>>> > On page 3 of the RDS document, under the section "RDP Interface", the 2nd
>>> > and 3rd paragraphs are state:
>>> >
>>> >    * RDP does not guarantee that a datagram is delivered to the remote
>>> > application.
>>> >    * It is up to the RDP client to deal with datagrams lost due to 
>>> transport
>>> > failure or remote application failure.
>>> >
>>> > The HCA is still a fault domain with RDS - it does not address 
>>> flushing data
>>> > out of the HCA fault domain, nor does it sound like it ensures that 
>>> CQE loss
>>> > is recoverable.
>>> >
>>> > I do believe RDS will replay all of the sendmsg's that it believes are
>>> > pending, but it has no way to determine if already sent sendmsgs were
>>> > actually successfully delivered to the remote application unless it 
>>> provides
>>> > some level of resync of the outstanding sends not completed from an
>>> > application's perspective as well as any state updated via RDMA 
>>> operations
>>> > which may occur without an explicit send operation to flush to a known
>>> > state.  I'm still trying to ascertain whether RDS completely recovers 
>>> from
>>> > HCA failure (assuming there is another HCA / path available) between 
>>> the two
>>> > endnodes.
>>>
>>>RDS will replay the sends that are completed in error by the HCA,
>>>which typically would happen if the current path fails or the remote
>>>node/HCA dies.
>>
>>Does this mean that the receiving RDS entity is responsible for dealing 
>>with duplicates?
>I believe so...
>
>A Send completion error does not mean that the
>>receiving endnode did not receive the data for either IB or iWARP; it 
>>only indicates that the Send operation failed which could be just a loss 
>>of the receive ACK with the Send completing on the receiver.  Such a
>>scenario would imply that RDS would have to comprehend what buffers have 
>>actually been consumed before retransmission, i.e. a resync is performed, 
>>else one could receive duplicate data at the application layer which can 
>>cause corruption or other problems as a function of the application 
>>(tolerance will vary by application thus the ULP must present consistent 
>>semantics to enable a broader set of applications than perhaps the 
>>initial targeted application to be supported).
>In absence of any protocol level ack (and regardless of protocol level 
>ack), it is the application which has to implement its own reliability. 
>RDS becomes a passive channel passing packet back and forth including 
>duplicate packets. The responsibility then shifts to the application to 
>figure out what is missing, duplicate's etc.

This would seem at odds with earlier assertions that as long as there were 
another path to the endnode, RDS would transparently recover on behalf of 
the application.  I thought Oracle stated for their application that send 
failure would be interpreted as endnode failure and cast out the peer - 
perhaps I misread their usage model.  Other applications who might want to 
use RDS could be designed to deal with the associated faults but if one has 
to deal with recovery / resync at the application layer, then that is quite 
a bit of work to perform in every application and is again at odds with the 
purpose of RDS which is to move reliability to the interconnect to the 
extent possible and to RDS so that the UDP application does not need to 
take on this complex code and attempt to get it right.

Mike

>Thanks
>Nitin
>
>
>>
>>>In case of a catastrophic error on the local HCA, subsequent sends will 
>>>fail (for a certain time (session_time_wait ) ) as if there was no 
>>>alternate path available at that time. On getting an error the 
>>>application should discard any sends unacknowledged by it's peer and 
>>>take corrective action.
>>
>>Unacknowledged by the peer means at the interconnect or the application 
>>level?  Again, how is the receive buffer management handled?
>>
>>>After the time_wait is over, subsequent sends will initiate a brand new 
>>>connection which could use the alternate HCA ( if the path is available).
>>
>>This is understood.
>>Mike
>>
>>------------------------------------------------------------------------
>>_______________________________________________
>>openib-general mailing list
>>openib-general at openib.org
>>http://openib.org/mailman/listinfo/openib-general
>>To unsubscribe, please visit 
>>http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051115/f94646ec/attachment.html>