<html>

<body>

<font size=3>At 10:28 AM 11/9/2005, Rick Frank wrote:<br>

</font><blockquote type=cite class=cite cite=""><font face="arial" size=2>

Yes, the application is responsible for detecting lost msgs at the

application level - the transport can not do this.<br>

</font><font size=3> <br>

</font><font face="arial" size=2>RDS does not guarantee that a message

has been delivered to the application - just that once the transport has

accepted a msg it will deliver the msg to the remote node in order

without duplication - dealing with retransmissions, etc due to sporadic /

intermittent msg loss over the interconnect. If after accepting the send

- the current path fails - then RDS will transparently fail over to

another path - and if required will resend / send any already queued msgs

to the remote node - again insuring that no msg is duplicated and they

are in order.  This is no different than APM - with the exception

that RDS can do this across HCAs. <br>

</font><font size=3> <br>

</font><font face="arial" size=2>The application - Oracle in this case -

will deal with detecting a catastrophic path failure - either due to a

send that does not arrive and or a timedout response or send failure

returned from the transport. If there is no network path to a remote node

- it is required that we remove the remote node from the operating

cluster to avoid what is commonly termed as a "split brain"

condition - otherwise known as a "partition in time".<br>

</font><font size=3> <br>

</font><font face="arial" size=2>BTW - in our case - the application

failure domain logic is the same whether we are using UDP /  uDAPL /

iTAPI / TCP / SCTP / etc. Basically, if we can not talk to a remote node

- after some defined period of time - we will remove the remote node from

the cluster. In this case the database will recover all the interesting

state that may have been maintained on the removed node - allowing the

remaining nodes to continue. If later on, communication to the remote

node is restored - it will be allowed to rejoin the cluster and take on

application load. </font></blockquote><br>

One could be able to talk to the remote node across other HCA but that

does not mean one has an understanding of the state at the remote node

unless the failure is noted and a resync of state occurs or the remote is

able to deal with duplicates, etc.   This has nothing to do

with API or the transport involved but, as Caitlin noted, the difference

between knowing a send buffer is free vs. knowing that the application

received the data requested.  Therefore, one has only reduced the

reliability / robustness problem space to some extent but has not solved

it by the use of RDS.<br><br>

Mike<br><br>

<blockquote type=cite class=cite cite=""><font size=3> <br>

 <br>

----- Original Message ----- <br>

</font>

<dl>

<dd>From:</b> <a href="mailto:krause@cup.hp.com">Michael Krause</a> <br>


<dd>To:</b> <a href="mailto:rpandit@silverstorm.com">Ranjit Pandit</a>

<br>


<dd>Cc:</b>

<a href="mailto:openib-general@openib.org">openib-general@openib.org</a>

<br>


<dd>Sent:</b> Tuesday, November 08, 2005 4:08 PM<br>


<dd>Subject:</b> Re: [openib-general] [ANNOUNCE] Contribute

RDS(ReliableDatagramSockets) to OpenIB<br><br>


<dd>At 12:33 PM 11/8/2005, Ranjit Pandit wrote:<br>

<blockquote type=cite class=cite cite="">

<dd>> Mike wrote:<br>


<dd>>  - RDS does not solve a set of failure models.  For

example, if a RNIC / HCA<br>


<dd>> were to fail, then one cannot simply replay the operations on

another RNIC /<br>


<dd>> HCA without extracting state, etc. and providing some end-to-end

sync of<br>


<dd>> what was really sent / received by the application.  Yes,

one can recover<br>


<dd>> from cable or switch port failure by using APM style recovery

but that is<br>


<dd>> only one class of faults.  The harder faults either result

in the end node<br>


<dd>> being cast out of the cluster or see silent data corruption

unless<br>


<dd>> additional steps are taken to transparently recover - again app

writers<br>


<dd>> don't want to solve the hard problems; they want that done for

them.<br><br>


<dd>The current reference implementation of RDS solves the HCA failure

case as well.<br>


<dd>Since applications don't need to keep connection states, it's

easier<br>


<dd>to handle cases like HCA and intermediate path failures.<br>


<dd>As far as application is concerned, every sendmsg 'could' result in

a<br>


<dd>new connection setup in the driver.<br>


<dd>If the current path fails, RDS reestablishes a connection, if<br>


<dd>available, on a different port or a different HCA , and replays

the<br>


<dd>failed messages.<br>


<dd>Using APM is not useful because it doesn't provide failover across

HCA's.</blockquote><br>


<dd>I think others may disagree about whether RDS solves the

problem.  You have no way of knowing whether something was received

or not into the other node's coherency domain without some intermediary

or application's involvement to see the data arrived.  As such, you

might see many hardware level acks occur and not know there is a real

failure.  If an application takes any action assuming that send

complete means it is delivered, then it is subject to silent data

corruption.  Hence, RDS can replay to its heart content but until

there is an application or middleware level of acknowledgement, you have

not solve the fault domain issues.  Some may be happy with this as

they just cast out the endnode from the cluster / database but others see

the loss of a server as a big deal so may not be happy to see this

occur.  It really comes down to whether you believe loosing a server

is worth while just for a local failure event which is not fatal to the

rest of the server.<br><br>


<dd>APM's value is the ability to recover from link failure.  It has

the same value for any other ULP in that it recovers transparently to the

ULP.<br><br>


<dd>Mike <br><br>

<hr>


<dd>_______________________________________________<br>


<dd>openib-general mailing list<br>


<dd>openib-general@openib.org<br>


<dd>

<a href="http://openib.org/mailman/listinfo/openib-general" eudora="autourl">

http://openib.org/mailman/listinfo/openib-general</a><br><br>


<dd>To unsubscribe, please visit

<a href="http://openib.org/mailman/listinfo/openib-general" eudora="autourl">

http://openib.org/mailman/listinfo/openib-general</a><br>


</dl></blockquote></body>

</html>