[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Fri Nov 11 15:55:53 PST 2005

At 01:01 PM 11/11/2005, Nitin Hande wrote:
>Michael Krause wrote:
>>At 10:28 AM 11/9/2005, Rick Frank wrote:
>>
>>>Yes, the application is responsible for detecting lost msgs at the 
>>>application level - the transport can not do this.
>>>
>>>RDS does not guarantee that a message has been delivered to the 
>>>application - just that once the transport has accepted a msg it will 
>>>deliver the msg to the remote node in order without duplication - 
>>>dealing with retransmissions, etc due to sporadic / intermittent msg 
>>>loss over the interconnect. If after accepting the send - the current 
>>>path fails - then RDS will transparently fail over to another path - and 
>>>if required will resend / send any already queued msgs to the remote 
>>>node - again insuring that no msg is duplicated and they are in 
>>>order.  This is no different than APM - with the exception that RDS can 
>>>do this across HCAs.
>>>
>>>The application - Oracle in this case - will deal with detecting a 
>>>catastrophic path failure - either due to a send that does not arrive 
>>>and or a timedout response or send failure returned from the transport. 
>>>If there is no network path to a remote node - it is required that we 
>>>remove the remote node from the operating cluster to avoid what is 
>>>commonly termed as a "split brain" condition - otherwise known as a 
>>>"partition in time".
>>>
>>>BTW - in our case - the application failure domain logic is the same 
>>>whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc. Basically, 
>>>if we can not talk to a remote node - after some defined period of time 
>>>- we will remove the remote node from the cluster. In this case the 
>>>database will recover all the interesting state that may have been 
>>>maintained on the removed node - allowing the remaining nodes to 
>>>continue. If later on, communication to the remote node is restored - it 
>>>will be allowed to rejoin the cluster and take on application load.
>>
>>Please clarify the following which was in the document provided by Oracle.
>>On page 3 of the RDS document, under the section "RDP Interface", the 2nd 
>>and 3rd paragraphs are state:
>>    * RDP does not guarantee that a datagram is delivered to the remote 
>> application.
>>    * It is up to the RDP client to deal with datagrams lost due to 
>> transport failure or remote application failure.
>>The HCA is still a fault domain with RDS - it does not address flushing 
>>data out of the HCA fault domain, nor does it sound like it ensures that 
>>CQE loss is recoverable.
>>I do believe RDS will replay all of the sendmsg's that it believes are 
>>pending, but it has no way to determine if already sent sendmsgs were 
>>actually successfully delivered to the remote application unless it 
>>provides some level of resync of the outstanding sends not completed from 
>>an application's perspective as well as any state updated via RDMA 
>>operations which may occur without an explicit send operation to flush to 
>>a known state.
>If RDS could define a mechanism that the application could use to inform 
>the sender to resync and replay on catastrophic failure, is that a correct 
>understanding of your suggestion ?

I'm not suggesting anything at this point. I'm trying to reconcile the 
documentation with the e-mail statements made by its proponents.

>I'm still trying to ascertain whether RDS completely
>>recovers from HCA failure (assuming there is another HCA / path 
>>available) between the two endnodes
>Reading at the doc and the thread, it looks like we need src/dst port for 
>multiplexing connections, we need seq/ack# for resyncing, we need some 
>kind of window availability for flow control. Are'nt we very close to tcp 
>header ? ..

TCP does not provide end-to-end to the application as implemented by most 
OS. Unless one ties TCP ACK to the application's consumption of the receive 
data, there is no method to ascertain that the application really received 
the data.   The application would be required to send its own 
application-level acknowledgement.   I believe the intent is for 
applications to remain responsible for the end-to-end receipt of data and 
that RDS and the interconnect are simply responsible for the exchange at 
the lower levels.

Mike 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051111/e54871dc/attachment.html>