[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB
Nitin Hande
Nitin.Hande at Sun.COM
Fri Nov 11 13:01:17 PST 2005
Michael Krause wrote:
> At 10:28 AM 11/9/2005, Rick Frank wrote:
>
>> Yes, the application is responsible for detecting lost msgs at the
>> application level - the transport can not do this.
>>
>> RDS does not guarantee that a message has been delivered to the
>> application - just that once the transport has accepted a msg it will
>> deliver the msg to the remote node in order without duplication -
>> dealing with retransmissions, etc due to sporadic / intermittent msg
>> loss over the interconnect. If after accepting the send - the current
>> path fails - then RDS will transparently fail over to another path -
>> and if required will resend / send any already queued msgs to the
>> remote node - again insuring that no msg is duplicated and they are in
>> order. This is no different than APM - with the exception that RDS
>> can do this across HCAs.
>>
>> The application - Oracle in this case - will deal with detecting a
>> catastrophic path failure - either due to a send that does not arrive
>> and or a timedout response or send failure returned from the
>> transport. If there is no network path to a remote node - it is
>> required that we remove the remote node from the operating cluster to
>> avoid what is commonly termed as a "split brain" condition - otherwise
>> known as a "partition in time".
>>
>> BTW - in our case - the application failure domain logic is the same
>> whether we are using UDP / uDAPL / iTAPI / TCP / SCTP / etc.
>> Basically, if we can not talk to a remote node - after some defined
>> period of time - we will remove the remote node from the cluster. In
>> this case the database will recover all the interesting state that may
>> have been maintained on the removed node - allowing the remaining
>> nodes to continue. If later on, communication to the remote node is
>> restored - it will be allowed to rejoin the cluster and take on
>> application load.
>
>
>
> Please clarify the following which was in the document provided by Oracle.
>
> On page 3 of the RDS document, under the section "RDP Interface", the
> 2nd and 3rd paragraphs are state:
>
> * RDP does not guarantee that a datagram is delivered to the remote
> application.
> * It is up to the RDP client to deal with datagrams lost due to
> transport failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address flushing
> data out of the HCA fault domain, nor does it sound like it ensures that
> CQE loss is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes are
> pending, but it has no way to determine if already sent sendmsgs were
> actually successfully delivered to the remote application unless it
> provides some level of resync of the outstanding sends not completed
> from an application's perspective as well as any state updated via RDMA
> operations which may occur without an explicit send operation to flush
> to a known state.
If RDS could define a mechanism that the application could use to
inform the sender to resync and replay on catastrophic failure, is
that a correct understanding of your suggestion ?
I'm still trying to ascertain whether RDS completely
> recovers from HCA failure (assuming there is another HCA / path
> available) between the two endnodes
Reading at the doc and the thread, it looks like we need src/dst port
for multiplexing connections, we need seq/ack# for resyncing, we need
some kind of window availability for flow control. Are'nt we very
close to tcp header ? ..
Nitin
.
>
> Mike
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list