<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.2722" name=GENERATOR></HEAD>

<BODY>

<DIV dir=ltr align=left><FONT face=Arial color=#0000ff 

size=2></FONT> </DIV><BR>

<BLOCKQUOTE 

style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">

  <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

  <HR tabIndex=-1>

  <FONT face=Tahoma size=2><B>From:</B> openib-general-bounces@openib.org 

  [mailto:openib-general-bounces@openib.org] <B>On Behalf Of </B>Michael 

  Krause<BR><B>Sent:</B> Tuesday, November 08, 2005 1:08 PM<BR><B>To:</B> Ranjit 

  Pandit<BR><B>Cc:</B> openib-general@openib.org<BR><B>Subject:</B> Re: 

  [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to 

  OpenIB<BR></FONT><BR></DIV>

  <DIV></DIV><FONT size=3>At 12:33 PM 11/8/2005, Ranjit Pandit wrote:<BR>

  <BLOCKQUOTE class=cite cite="" type="cite">> Mike wrote:<BR>>  - 

    RDS does not solve a set of failure models.  For example, if a RNIC / 

    HCA<BR>> were to fail, then one cannot simply replay the operations on 

    another RNIC /<BR>> HCA without extracting state, etc. and providing some 

    end-to-end sync of<BR>> what was really sent / received by the 

    application.  Yes, one can recover<BR>> from cable or switch port 

    failure by using APM style recovery but that is<BR>> only one class of 

    faults.  The harder faults either result in the end node<BR>> being 

    cast out of the cluster or see silent data corruption unless<BR>> 

    additional steps are taken to transparently recover - again app 

    writers<BR>> don't want to solve the hard problems; they want that done 

    for them.<BR><BR>The current reference implementation of RDS solves the HCA 

    failure case as well.<BR>Since applications don't need to keep connection 

    states, it's easier<BR>to handle cases like HCA and intermediate path 

    failures.<BR>As far as application is concerned, every sendmsg 'could' 

    result in a<BR>new connection setup in the driver.<BR>If the current path 

    fails, RDS reestablishes a connection, if<BR>available, on a different port 

    or a different HCA , and replays the<BR>failed messages.<BR>Using APM is not 

    useful because it doesn't provide failover across HCA's.</BLOCKQUOTE>

  <DIV><BR>I think others may disagree about whether RDS solves the 

  problem.  You have no way of knowing whether something was received or 

  not into the other node's coherency domain without some intermediary or 

  application's involvement to see the data arrived.  As such, you might 

  see many hardware level acks occur and not know there is a real failure.  

  If an application takes any action assuming that send complete means it is 

  delivered, then it is subject to silent data corruption.  Hence, RDS can 

  replay to its heart content but until there is an application or middleware 

  level of acknowledgement, you have not solve the fault domain issues.  

  Some may be happy with this as they just cast out the endnode from the cluster 

  / database but others see the loss of a server as a big deal so may not be 

  happy to see this occur.  It really comes down to whether you believe 

  loosing a server is worth while just for a local failure event which is not 

  fatal to the rest of the server.<BR><BR><SPAN class=945263916-09112005><FONT 

  face=Arial color=#0000ff size=2>[cait] </FONT></SPAN></DIV></BLOCKQUOTE>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>Applications should not infer anything from send 

completion other than that their source</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>buffer is no longer requried for the transmit to 

complete.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005></SPAN></FONT></FONT></FONT> </DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>That is the only assumption that can be supported in a 

transport neutral way.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005></SPAN></FONT></FONT></FONT> </DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>I'll also point out that even under InfiniBand the fact 

that a send or write has</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>completed does NOT guarantee that the remote peer has 

*noticed* the data.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>The Remote peer could fail *after* the date has been 

delivered to it and before</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>it has had a chance to act upon it. A well-designed 

robust application should</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>never rely on anything other than a peer ack to 

indicate that the peer has truly</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>taken ownership of transmitted 

information.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005></SPAN></FONT></FONT></FONT> </DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>The essence of RDS, or any similar solution, is the 

delivery of message with</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>datagram semantics reliably over point-to-point 

reliable connections. So whatever</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>reliability and fault-tolerance benefits the reliable 

connections are inherited by</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>the RDS layer. After that it is mostly a matter of how 

you avoid head-of-line</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>blocking problems when there is no receive buffer. You 

don't want to send</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>an RNR (or drop the DDP Segment under iWARP) because 

*one* endpoint</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>does not have available buffers. Other than that any 

reliable datagram service</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005>should be just as reliable as the underlying rc 

service.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN 

class=945263916-09112005></SPAN></FONT></FONT></FONT> </DIV></FONT></BODY></HTML>