[openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

Tue Nov 8 11:52:07 PST 2005

At 03:02 PM 11/4/2005, Rimmer, Todd wrote:
> > Bob wrote,
> > Perhaps if tunneling udp packets over RC connections rather than
> > UD connections provides better performance, as was seen in the RDS
> > experiment, then why not just convert
> > IPoIB to use a connected model (rather than datagrams)
> > and then all existing IP upper level
> > protocols would could benefit, TCP, UDP, SCTP, ....
>
>This would miss the second major improvement of RDS, namely removing the 
>need for the application to perform timeouts and retries on datagram 
>packets.  If Oracle ran over UDP/IP/IPoIB it would not be guaranteed a 
>loss-less reliable interface.  If UDP/IP/IPoIB provided a loss-less 
>reliable interface it would likely break or affect other UDP applications 
>which are expecting a flow controlled interface.

The entire discussion might be distilled into the following:

- Datagram applications trade reliability for flexibility and resource 
savings.

- Datagram applications that require reliability have to re-invent the 
wheel and given it is non-trivial, they often get it variable quality and 
can suffer performance loss if done poorly or the network is very 
lossy.  Given networks are a lot less lossy today than years past, sans 
congestion drops, one might argue about whether there is still a 
significant problem or not.

- The reliable datagram model isn't new - been there, done that on earlier 
interconnects - but it isn't free.  IB could have done something like RDS 
but the people who pushed the original requirements (some who are 
advocating RDS now) did not want to take on the associated software 
enablement thus it was subsumed into hardware and made slightly more 
restrictive as a result - perhaps more than some people may like.  The only 
real delta between RDS one sense and the current IB RD is the number of 
outstanding messages in flight on a given EEC.  If RD were re-defined to 
allow software to recover some types of failures much like UC, then one 
could simply use RD.

- RDS does not solve a set of failure models.  For example, if a RNIC / HCA 
were to fail, then one cannot simply replay the operations on another RNIC 
/ HCA without extracting state, etc. and providing some end-to-end sync of 
what was really sent / received by the application.  Yes, one can recover 
from cable or switch port failure by using APM style recovery but that is 
only one class of faults.  The harder faults either result in the end node 
being cast out of the cluster or see silent data corruption unless 
additional steps are taken to transparently recover - again app writers 
don't want to solve the hard problems; they want that done for them.

- RNIC / HCA provide hardware acceleration and reliable delivery to the 
remote RNIC / HCA (not to the application since that is in a separate fault 
domain).  Doing software multiplexing over such an interconnect as 
envisioned for IB RD is relatively straight in many respects but not a 
trivial exercise as some might contend.  Yes, people can point to a small 
number of lines of code but that is just for the initial offering and is 
not an indication of what it might have to become long-term to add all of 
the bells-n-whistles that people have envisioned.

- RDS is not an API but a ULP.  It really uses a set of physical 
connections and which are then used to set up logical application 
associations (often referred to as connections but really are not in terms 
of the interconnect).  These associations can be quickly established as 
they are just control messages over the existing physical 
connections.  Again, builds on concepts already shipping in earlier 
interconnects / solutions from a number of years back.  Hence, for large 
scale applications which are association intensive, RDS is able to improve 
the performance of establishing these associations.  While RDS improves the 
performance in this regard, its impacts on actual performance stem more 
from avoiding some operations thus nearly all of the performance numbers 
quoted are really an apple-to-orange comparison.  Nothing wrong with this 
but people need to keep in mind that things are not being compared with one 
another on the same level thus the results can look more dramatic.

- One thing to keep in mind is that RDS is about not doing work to gain 
performance and to potentially improve code by eliminating software that 
was too complex / difficult to get clean when it was invoked to recover 
from fabric-related issues.  This is somewhat the same logic as used by NFS 
when migrating to TCP from UDP.   Could not get clean software so change 
the underlying comms to push the problem to a place where it is largely solved.

Now, whether you believe RDS is great or not, it is an attempt to solve a 
problem plaguing one class of applications who'd rather not spend their 
resources on the problem.  That is a fair thing to consider if someone else 
has already done it better using another technology.  One could also 
consider having IB change the RD semantics to see if that would solve the 
problem since it would not require a new ULP to make it work when you think 
about it though there is no analog with iWARP.  The discussion so far has 
been interesting and I think there is fair push back to avoid re-inventing 
the wheel especially on the idea of trying to do this directly on Ethernet 
(that seems like just re-inventing all of that buggy code people stated 
they could not get right at the app layer in the first place and largely 
goes against the logic used to create IB and as well as iWARP's use of TCP 
in the first place).

Mike 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051108/07f0dfa1/attachment.html>