<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.2900.2769" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face=Arial size=2>Yes, the application is responsible for detecting 

lost msgs at the application level - the transport can not do this.</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>RDS does not guarantee that a message has been 

delivered to the application - just that once the transport has accepted a 

msg it will deliver the msg to the remote node in order without duplication 

- dealing with retransmissions, etc due to sporadic / intermittent msg loss over 

the interconnect. If after accepting the send - the current path fails - then 

RDS will transparently fail over to another path - and if required will resend / 

send any already queued msgs to the remote node - again insuring that no msg is 

duplicated and they are in order.  This is no different than APM - with the 

exception that RDS can do this across HCAs. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>The application - Oracle in this case - will deal 

with detecting a catastrophic path failure - either due to a send that does not 

arrive and or a timedout response or send failure returned from the transport. 

If there is no network path to a remote node - it is required that we 

remove the remote node from the operating cluster to avoid what is commonly 

termed as a "split brain" condition - otherwise known as a "partition in 

time".</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>BTW - in our case - the application failure domain 

logic is the same whether we are using UDP /  uDAPL / iTAPI / TCP / 

SCTP / etc. Basically, if we can not talk to a remote node - after some defined 

period of time - we will remove the remote node from the cluster. In this case 

the database will recover all the interesting state that may have been 

maintained on the removed node - allowing the remaining nodes to continue. If 

later on, communication to the remote node is restored - it will be allowed to 

rejoin the cluster and take on application load. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV>----- Original Message ----- </DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">

  <DIV 

  style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B> 

  <A title=krause@cup.hp.com href="mailto:krause@cup.hp.com">Michael Krause</A> 

  </DIV>

  <DIV style="FONT: 10pt arial"><B>To:</B> <A title=rpandit@silverstorm.com 

  href="mailto:rpandit@silverstorm.com">Ranjit Pandit</A> </DIV>

  <DIV style="FONT: 10pt arial"><B>Cc:</B> <A title=openib-general@openib.org 

  href="mailto:openib-general@openib.org">openib-general@openib.org</A> </DIV>

  <DIV style="FONT: 10pt arial"><B>Sent:</B> Tuesday, November 08, 2005 4:08 

  PM</DIV>

  <DIV style="FONT: 10pt arial"><B>Subject:</B> Re: [openib-general] [ANNOUNCE] 

  Contribute RDS(ReliableDatagramSockets) to OpenIB</DIV>

  <DIV><BR></DIV><FONT size=3>At 12:33 PM 11/8/2005, Ranjit Pandit wrote:<BR>

  <BLOCKQUOTE class=cite cite="" type="cite">> Mike wrote:<BR>>  - 

    RDS does not solve a set of failure models.  For example, if a RNIC / 

    HCA<BR>> were to fail, then one cannot simply replay the operations on 

    another RNIC /<BR>> HCA without extracting state, etc. and providing some 

    end-to-end sync of<BR>> what was really sent / received by the 

    application.  Yes, one can recover<BR>> from cable or switch port 

    failure by using APM style recovery but that is<BR>> only one class of 

    faults.  The harder faults either result in the end node<BR>> being 

    cast out of the cluster or see silent data corruption unless<BR>> 

    additional steps are taken to transparently recover - again app 

    writers<BR>> don't want to solve the hard problems; they want that done 

    for them.<BR><BR>The current reference implementation of RDS solves the HCA 

    failure case as well.<BR>Since applications don't need to keep connection 

    states, it's easier<BR>to handle cases like HCA and intermediate path 

    failures.<BR>As far as application is concerned, every sendmsg 'could' 

    result in a<BR>new connection setup in the driver.<BR>If the current path 

    fails, RDS reestablishes a connection, if<BR>available, on a different port 

    or a different HCA , and replays the<BR>failed messages.<BR>Using APM is not 

    useful because it doesn't provide failover across HCA's.</BLOCKQUOTE><BR>I 

  think others may disagree about whether RDS solves the problem.  You have 

  no way of knowing whether something was received or not into the other node's 

  coherency domain without some intermediary or application's involvement to see 

  the data arrived.  As such, you might see many hardware level acks occur 

  and not know there is a real failure.  If an application takes any action 

  assuming that send complete means it is delivered, then it is subject to 

  silent data corruption.  Hence, RDS can replay to its heart content but 

  until there is an application or middleware level of acknowledgement, you have 

  not solve the fault domain issues.  Some may be happy with this as they 

  just cast out the endnode from the cluster / database but others see the loss 

  of a server as a big deal so may not be happy to see this occur.  It 

  really comes down to whether you believe loosing a server is worth while just 

  for a local failure event which is not fatal to the rest of the 

  server.<BR><BR>APM's value is the ability to recover from link failure.  

  It has the same value for any other ULP in that it recovers transparently to 

  the ULP.<BR><BR>Mike</FONT> 

  <P>

  <HR>


  <P></P>_______________________________________________<BR>openib-general 

  mailing 

  list<BR>openib-general@openib.org<BR>http://openib.org/mailman/listinfo/openib-general<BR><BR>To 

  unsubscribe, please visit 

http://openib.org/mailman/listinfo/openib-general</BLOCKQUOTE></BODY></HTML>