<html>

<body>

<font size=3>At 12:49 PM 11/14/2005, Nitin Hande wrote:<br>

<blockquote type=cite class=cite cite="">Michael Krause wrote:<br>

<blockquote type=cite class=cite cite="">At 01:01 PM 11/11/2005, Nitin

Hande wrote:<br><br>

<blockquote type=cite class=cite cite="">Michael Krause wrote:<br><br>

<blockquote type=cite class=cite cite="">At 10:28 AM 11/9/2005, Rick

Frank wrote:<br><br>

<blockquote type=cite class=cite cite="">Yes, the application is

responsible for detecting lost msgs at the application level - the

transport can not do this.<br>

 <br>

RDS does not guarantee that a message has been delivered to the

application - just that once the transport has accepted a msg it will

deliver the msg to the remote node in order without duplication - dealing

with retransmissions, etc due to sporadic / intermittent msg loss over

the interconnect. If after accepting the send - the current path fails -

then RDS will transparently fail over to another path - and if required

will resend / send any already queued msgs to the remote node - again

insuring that no msg is duplicated and they are in order.  This is

no different than APM - with the exception that RDS can do this across

HCAs.<br>

 <br>

The application - Oracle in this case - will deal with detecting a

catastrophic path failure - either due to a send that does not arrive and

or a timedout response or send failure returned from the transport. If

there is no network path to a remote node - it is required that we remove

the remote node from the operating cluster to avoid what is commonly

termed as a "split brain" condition - otherwise known as a

"partition in time".<br>

 <br>

BTW - in our case - the application failure domain logic is the same

whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc.

Basically, if we can not talk to a remote node - after some defined

period of time - we will remove the remote node from the cluster. In this

case the database will recover all the interesting state that may have

been maintained on the removed node - allowing the remaining nodes to

continue. If later on, communication to the remote node is restored - it

will be allowed to rejoin the cluster and take on application load.

</blockquote><br><br>

Please clarify the following which was in the document provided by

Oracle.<br>

On page 3 of the RDS document, under the section "RDP

Interface", the 2nd and 3rd paragraphs are state:<br>

   * RDP does not guarantee that a datagram is delivered to the

remote application.<br>

   * It is up to the RDP client to deal with datagrams lost due

to transport failure or remote application failure.<br>

The HCA is still a fault domain with RDS - it does not address flushing

data out of the HCA fault domain, nor does it sound like it ensures that

CQE loss is recoverable.<br>

I do believe RDS will replay all of the sendmsg's that it believes are

pending, but it has no way to determine if already sent sendmsgs were

actually successfully delivered to the remote application unless it

provides some level of resync of the outstanding sends not completed from

an application's perspective as well as any state updated via RDMA

operations which may occur without an explicit send operation to flush to

a known state.  </blockquote><br>

If RDS could define a mechanism that the application could use to inform

the sender to resync and replay on catastrophic failure, is that a

correct understanding of your suggestion ?</blockquote><br>

I'm not suggesting anything at this point. I'm trying to reconcile the

documentation with the e-mail statements made by its proponents.<br><br>

<blockquote type=cite class=cite cite="">I'm still trying to ascertain

whether RDS completely<br><br>

<blockquote type=cite class=cite cite="">recovers from HCA failure

(assuming there is another HCA / path available) between the two

endnodes</blockquote><br>

Reading at the doc and the thread, it looks like we need src/dst port for

multiplexing connections, we need seq/ack# for resyncing, we need some

kind of window availability for flow control. Are'nt we very close to tcp

header ? ..</blockquote><br>

TCP does not provide end-to-end to the application as implemented by most

OS. Unless one ties TCP ACK to the application's consumption of the

receive data, there is no method to ascertain that the application really

received the data.   The application would be required to send

its own application-level acknowledgement.   I believe the

intent is for applications to remain responsible for the end-to-end

receipt of data and that RDS and the interconnect are simply responsible

for the exchange at the lower levels.</blockquote>Yes, a TCP ack only

implies that it has received the data, and means nothing to the

application. It is the application which has send a application level ack

to its peer.</blockquote><br>

TCP ACK was intended to be an end-to-end ACK but implementations took it

to a lower level ACK only.  A TCP stack linked into an application

as demonstrated by multiple IHV and research does provide an end-to-end

ACK and considerable performance improvements over the traditional

network stack implementations.  Some claim it is more than good

enough to eliminate the need for protocol off-load / RDMA which is true

for many applications (certainly for most Sockets, etc.)  but not

true when one takes advantage of the RDMA comms paradigm which has

benefit for a number of applications.<br><br>

Mike</font></body>

</html>