[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Nitin Hande Nitin.Hande at Sun.COM
Fri Nov 11 13:01:17 PST 2005


Michael Krause wrote:
> At 10:28 AM 11/9/2005, Rick Frank wrote:
> 
>> Yes, the application is responsible for detecting lost msgs at the 
>> application level - the transport can not do this.
>>  
>> RDS does not guarantee that a message has been delivered to the 
>> application - just that once the transport has accepted a msg it will 
>> deliver the msg to the remote node in order without duplication - 
>> dealing with retransmissions, etc due to sporadic / intermittent msg 
>> loss over the interconnect. If after accepting the send - the current 
>> path fails - then RDS will transparently fail over to another path - 
>> and if required will resend / send any already queued msgs to the 
>> remote node - again insuring that no msg is duplicated and they are in 
>> order.  This is no different than APM - with the exception that RDS 
>> can do this across HCAs.
>>  
>> The application - Oracle in this case - will deal with detecting a 
>> catastrophic path failure - either due to a send that does not arrive 
>> and or a timedout response or send failure returned from the 
>> transport. If there is no network path to a remote node - it is 
>> required that we remove the remote node from the operating cluster to 
>> avoid what is commonly termed as a "split brain" condition - otherwise 
>> known as a "partition in time".
>>  
>> BTW - in our case - the application failure domain logic is the same 
>> whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc. 
>> Basically, if we can not talk to a remote node - after some defined 
>> period of time - we will remove the remote node from the cluster. In 
>> this case the database will recover all the interesting state that may 
>> have been maintained on the removed node - allowing the remaining 
>> nodes to continue. If later on, communication to the remote node is 
>> restored - it will be allowed to rejoin the cluster and take on 
>> application load. 
> 
> 
> 
> Please clarify the following which was in the document provided by Oracle.
> 
> On page 3 of the RDS document, under the section "RDP Interface", the 
> 2nd and 3rd paragraphs are state:
> 
>    * RDP does not guarantee that a datagram is delivered to the remote 
> application.
>    * It is up to the RDP client to deal with datagrams lost due to 
> transport failure or remote application failure.
> 
> The HCA is still a fault domain with RDS - it does not address flushing 
> data out of the HCA fault domain, nor does it sound like it ensures that 
> CQE loss is recoverable.
> 
> I do believe RDS will replay all of the sendmsg's that it believes are 
> pending, but it has no way to determine if already sent sendmsgs were 
> actually successfully delivered to the remote application unless it 
> provides some level of resync of the outstanding sends not completed 
> from an application's perspective as well as any state updated via RDMA 
> operations which may occur without an explicit send operation to flush 
> to a known state.  
If RDS could define a mechanism that the application could use to 
inform the sender to resync and replay on catastrophic failure, is 
that a correct understanding of your suggestion ?

I'm still trying to ascertain whether RDS completely
> recovers from HCA failure (assuming there is another HCA / path 
> available) between the two endnodes
Reading at the doc and the thread, it looks like we need src/dst port 
for multiplexing connections, we need seq/ack# for resyncing, we need 
some kind of window availability for flow control. Are'nt we very 
close to tcp header ? ..

Nitin

.
> 
> Mike
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list