<html>

<body>

<font size=3>At 03:02 PM 11/4/2005, Rimmer, Todd wrote:<br>

<blockquote type=cite class=cite cite="">> Bob wrote,<br>

> Perhaps if tunneling udp packets over RC connections rather

than<br>

> UD connections provides better performance, as was seen in the

RDS<br>

> experiment, then why not just convert<br>

> IPoIB to use a connected model (rather than datagrams)<br>

> and then all existing IP upper level<br>

> protocols would could benefit, TCP, UDP, SCTP, ....<br><br>

This would miss the second major improvement of RDS, namely removing the

need for the application to perform timeouts and retries on datagram

packets.  If Oracle ran over UDP/IP/IPoIB it would not be guaranteed

a loss-less reliable interface.  If UDP/IP/IPoIB provided a

loss-less reliable interface it would likely break or affect other UDP

applications which are expecting a flow controlled

interface.</blockquote><br>

The entire discussion might be distilled into the following:<br><br>

- Datagram applications trade reliability for flexibility and resource

savings.  <br><br>

- Datagram applications that require reliability have to re-invent the

wheel and given it is non-trivial, they often get it variable quality and

can suffer performance loss if done poorly or the network is very

lossy.  Given networks are a lot less lossy today than years past,

sans congestion drops, one might argue about whether there is still a

significant problem or not.<br><br>

- The reliable datagram model isn't new - been there, done that on

earlier interconnects - but it isn't free.  IB could have done

something like RDS but the people who pushed the original requirements

(some who are advocating RDS now) did not want to take on the associated

software enablement thus it was subsumed into hardware and made slightly

more restrictive as a result - perhaps more than some people may

like.  The only real delta between RDS one sense and the current IB

RD is the number of outstanding messages in flight on a given EEC. 

If RD were re-defined to allow software to recover some types of failures

much like UC, then one could simply use RD.<br><br>

- RDS does not solve a set of failure models.  For example, if a

RNIC / HCA were to fail, then one cannot simply replay the operations on

another RNIC / HCA without extracting state, etc. and providing some

end-to-end sync of what was really sent / received by the

application.  Yes, one can recover from cable or switch port failure

by using APM style recovery but that is only one class of faults. 

The harder faults either result in the end node being cast out of the

cluster or see silent data corruption unless additional steps are taken

to transparently recover - again app writers don't want to solve the hard

problems; they want that done for them.<br><br>

- RNIC / HCA provide hardware acceleration and reliable delivery to the

remote RNIC / HCA (not to the application since that is in a separate

fault domain).  Doing software multiplexing over such an

interconnect as envisioned for IB RD is relatively straight in many

respects but not a trivial exercise as some might contend.  Yes,

people can point to a small number of lines of code but that is just for

the initial offering and is not an indication of what it might have to

become long-term to add all of the bells-n-whistles that people have

envisioned.<br><br>

- RDS is not an API but a ULP.  It really uses a set of physical

connections and which are then used to set up logical application

associations (often referred to as connections but really are not in

terms of the interconnect).  These associations can be quickly

established as they are just control messages over the existing physical

connections.  Again, builds on concepts already shipping in earlier

interconnects / solutions from a number of years back.  Hence, for

large scale applications which are association intensive, RDS is able to

improve the performance of establishing these associations.  While

RDS improves the performance in this regard, its impacts on actual

performance stem more from avoiding some operations thus nearly all of

the performance numbers quoted are really an apple-to-orange

comparison.  Nothing wrong with this but people need to keep in mind

that things are not being compared with one another on the same level

thus the results can look more dramatic.<br><br>

- One thing to keep in mind is that RDS is about not doing work to gain

performance and to potentially improve code by eliminating software that

was too complex / difficult to get clean when it was invoked to recover

from fabric-related issues.  This is somewhat the same logic as used

by NFS when migrating to TCP from UDP.   Could not get clean

software so change the underlying comms to push the problem to a place

where it is largely solved.<br><br>

Now, whether you believe RDS is great or not, it is an attempt to solve a

problem plaguing one class of applications who'd rather not spend their

resources on the problem.  That is a fair thing to consider if

someone else has already done it better using another technology. 

One could also consider having IB change the RD semantics to see if that

would solve the problem since it would not require a new ULP to make it

work when you think about it though there is no analog with iWARP. 

The discussion so far has been interesting and I think there is fair push

back to avoid re-inventing the wheel especially on the idea of trying to

do this directly on Ethernet (that seems like just re-inventing all of

that buggy code people stated they could not get right at the app layer

in the first place and largely goes against the logic used to create IB

and as well as iWARP's use of TCP in the first place).<br><br>

Mike</font></body>

</html>