[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Michael Krause krause at cup.hp.com
Wed Nov 9 12:21:06 PST 2005


At 10:28 AM 11/9/2005, Rick Frank wrote:
>Yes, the application is responsible for detecting lost msgs at the 
>application level - the transport can not do this.
>
>RDS does not guarantee that a message has been delivered to the 
>application - just that once the transport has accepted a msg it will 
>deliver the msg to the remote node in order without duplication - dealing 
>with retransmissions, etc due to sporadic / intermittent msg loss over the 
>interconnect. If after accepting the send - the current path fails - then 
>RDS will transparently fail over to another path - and if required will 
>resend / send any already queued msgs to the remote node - again insuring 
>that no msg is duplicated and they are in order.  This is no different 
>than APM - with the exception that RDS can do this across HCAs.
>
>The application - Oracle in this case - will deal with detecting a 
>catastrophic path failure - either due to a send that does not arrive and 
>or a timedout response or send failure returned from the transport. If 
>there is no network path to a remote node - it is required that we remove 
>the remote node from the operating cluster to avoid what is commonly 
>termed as a "split brain" condition - otherwise known as a "partition in time".
>
>BTW - in our case - the application failure domain logic is the same 
>whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc. Basically, 
>if we can not talk to a remote node - after some defined period of time - 
>we will remove the remote node from the cluster. In this case the database 
>will recover all the interesting state that may have been maintained on 
>the removed node - allowing the remaining nodes to continue. If later on, 
>communication to the remote node is restored - it will be allowed to 
>rejoin the cluster and take on application load.

One could be able to talk to the remote node across other HCA but that does 
not mean one has an understanding of the state at the remote node unless 
the failure is noted and a resync of state occurs or the remote is able to 
deal with duplicates, etc.   This has nothing to do with API or the 
transport involved but, as Caitlin noted, the difference between knowing a 
send buffer is free vs. knowing that the application received the data 
requested.  Therefore, one has only reduced the reliability / robustness 
problem space to some extent but has not solved it by the use of RDS.

Mike

>
>
>----- Original Message -----
>From: <mailto:krause at cup.hp.com>Michael Krause
>To: <mailto:rpandit at silverstorm.com>Ranjit Pandit
>Cc: <mailto:openib-general at openib.org>openib-general at openib.org
>Sent: Tuesday, November 08, 2005 4:08 PM
>Subject: Re: [openib-general] [ANNOUNCE] Contribute 
>RDS(ReliableDatagramSockets) to OpenIB
>
>At 12:33 PM 11/8/2005, Ranjit Pandit wrote:
>> > Mike wrote:
>> >  - RDS does not solve a set of failure models.  For example, if a RNIC 
>> / HCA
>> > were to fail, then one cannot simply replay the operations on another 
>> RNIC /
>> > HCA without extracting state, etc. and providing some end-to-end sync of
>> > what was really sent / received by the application.  Yes, one can recover
>> > from cable or switch port failure by using APM style recovery but that is
>> > only one class of faults.  The harder faults either result in the end node
>> > being cast out of the cluster or see silent data corruption unless
>> > additional steps are taken to transparently recover - again app writers
>> > don't want to solve the hard problems; they want that done for them.
>>
>>The current reference implementation of RDS solves the HCA failure case 
>>as well.
>>Since applications don't need to keep connection states, it's easier
>>to handle cases like HCA and intermediate path failures.
>>As far as application is concerned, every sendmsg 'could' result in a
>>new connection setup in the driver.
>>If the current path fails, RDS reestablishes a connection, if
>>available, on a different port or a different HCA , and replays the
>>failed messages.
>>Using APM is not useful because it doesn't provide failover across HCA's.
>
>I think others may disagree about whether RDS solves the problem.  You 
>have no way of knowing whether something was received or not into the 
>other node's coherency domain without some intermediary or application's 
>involvement to see the data arrived.  As such, you might see many hardware 
>level acks occur and not know there is a real failure.  If an application 
>takes any action assuming that send complete means it is delivered, then 
>it is subject to silent data corruption.  Hence, RDS can replay to its 
>heart content but until there is an application or middleware level of 
>acknowledgement, you have not solve the fault domain issues.  Some may be 
>happy with this as they just cast out the endnode from the cluster / 
>database but others see the loss of a server as a big deal so may not be 
>happy to see this occur.  It really comes down to whether you believe 
>loosing a server is worth while just for a local failure event which is 
>not fatal to the rest of the server.
>
>APM's value is the ability to recover from link failure.  It has the same 
>value for any other ULP in that it recovers transparently to the ULP.
>
>Mike
>
>
>----------
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051109/b70fdf35/attachment.html>


More information about the general mailing list