[openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

Caitlin Bestler caitlinb at broadcom.com
Tue Nov 8 13:04:11 PST 2005


 


________________________________

	From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Michael Krause
	Sent: Tuesday, November 08, 2005 11:52 AM
	To: Rimmer, Todd
	Cc: openib-general at openib.org
	Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (
ReliableDatagramSockets) to OpenIB
	
	 The entire discussion might be distilled into the following:
	
	- Datagram applications trade reliability for flexibility and
resource savings.  
	
	

Reliable Datagram applications have endpoints that accept messages from
multiple
known sources, rather than from a single known source (TCP, RC) or
multiple 
unknown sources (UDP, RD).
 
This does save resources, but perhaps just as importantly it may reflect
how
the application truly thinks of its communication endpoints. Oracle is
not unique
in this communication requirement. This is essentially the interface MPI
presents
to its users as well.

	 
	 
	- Datagram applications that require reliability have to
re-invent the wheel and given it is non-trivial, they often get it
variable quality and can suffer performance loss if done poorly or the
network is very lossy.  Given networks are a lot less lossy today than
years past, sans congestion drops, one might argue about whether there
is still a significant problem or not.
	
	[cait] 

Standardized congestion control that is not dependent on application
specific
control is highly desirable. In the IP world new ULPs based upon UDP are
heavily discouraged for exactly this reason.

	 
	 
	- The reliable datagram model isn't new - been there, done that
on earlier interconnects - but it isn't free.  IB could have done
something like RDS but the people who pushed the original requirements
(some who are advocating RDS now) did not want to take on the associated
software enablement thus it was subsumed into hardware and made slightly
more restrictive as a result - perhaps more than some people may like.
The only real delta between RDS one sense and the current IB RD is the
number of outstanding messages in flight on a given EEC.  If RD were
re-defined to allow software to recover some types of failures much like
UC, then one could simply use RD.
	
	[cait] 

The RDS API should definitely be compatiable with IB RD service,
especially any later
one that solves the crippling limitation on in-flight messages.
 
Similarly the API should be compatible with IP based solutions, which
since it is derived
from SOCK_DGRAM isn't much of a challenge.
 

	 
	 
	- RDS does not solve a set of failure models.  For example, if a
RNIC / HCA were to fail, then one cannot simply replay the operations on
another RNIC / HCA without extracting state, etc. and providing some
end-to-end sync of what was really sent / received by the application.
Yes, one can recover from cable or switch port failure by using APM
style recovery but that is only one class of faults.  The harder faults
either result in the end node being cast out of the cluster or see
silent data corruption unless additional steps are taken to
transparently recover - again app writers don't want to solve the hard
problems; they want that done for them.
	[cait] 

This goes to the question of where the Reliable Datagram Service is
implemented.
When done as middleware over existing reliable connection services then
the middleware
does have a few issues on handling flushed buffers after an RNIC
failure. These issues make
implementation of a zero-copy strategy more of an issue.
 
But if the endpoint is truly a datagram endpoint then these issues are
the same as for
failover of connection-oriented endpoints between two RNICs/HCAs.
 

	 
	
	- RNIC / HCA provide hardware acceleration and reliable delivery
to the remote RNIC / HCA (not to the application since that is in a
separate fault domain).  Doing software multiplexing over such an
interconnect as envisioned for IB RD is relatively straight in many
respects but not a trivial exercise as some might contend.  Yes, people
can point to a small number of lines of code but that is just for the
initial offering and is not an indication of what it might have to
become long-term to add all of the bells-n-whistles that people have
envisioned.
	
	[cait] 

IB RD is not transport neutral, and has the problem of severe in-flight
limitations that
would make it unacceptable to most applications that would benefit from
RDS even
if they were 
 
There is no way that iWARP vendors would ever implement a service
designed to
match IB RD. An RDS service could be implemented over TCP, MPA, MS-MPA
or SCTP.
 

	 
	- RDS is not an API but a ULP.  It really uses a set of physical
connections and which are then used to set up logical application
associations (often referred to as connections but really are not in
terms of the interconnect).  These associations can be quickly
established as they are just control messages over the existing physical
connections.  Again, builds on concepts already shipping in earlier
interconnects / solutions from a number of years back.  Hence, for large
scale applications which are association intensive, RDS is able to
improve the performance of establishing these associations.  While RDS
improves the performance in this regard, its impacts on actual
performance stem more from avoiding some operations thus nearly all of
the performance numbers quoted are really an apple-to-orange comparison.
Nothing wrong with this but people need to keep in mind that things are
not being compared with one another on the same level thus the results
can look more dramatic.
	[cait] 

All correct.
 
The real issue with RDS is whether it makes sense to present this a
pseudo-transport service,
or if its just a suggested strategy that each application should
implement on its own. From a
wire perspective there isn't much different. From a development
perspective it makes sense
as long as the pseudo-transport definition is indeed defined as though
it were a transport.
I believe that is the case here.
 

	 
	
	- One thing to keep in mind is that RDS is about not doing work
to gain performance and to potentially improve code by eliminating
software that was too complex / difficult to get clean when it was
invoked to recover from fabric-related issues.  This is somewhat the
same logic as used by NFS when migrating to TCP from UDP.   Could not
get clean software so change the underlying comms to push the problem to
a place where it is largely solved.
	
	Now, whether you believe RDS is great or not, it is an attempt
to solve a problem plaguing one class of applications who'd rather not
spend their resources on the problem.  That is a fair thing to consider
if someone else has already done it better using another technology.
One could also consider having IB change the RD semantics to see if that
would solve the problem since it would not require a new ULP to make it
work when you think about it though there is no analog with iWARP.  The
discussion so far has been interesting and I think there is fair push
back to avoid re-inventing the wheel especially on the idea of trying to
do this directly on Ethernet (that seems like just re-inventing all of
that buggy code people stated they could not get right at the app layer
in the first place and largely goes against the logic used to create IB
and as well as iWARP's use of TCP in the first place).
	
	
	[cait] 

If there were a definition of a usable RD service from IB then porting
it to iWARP could
be considered as well. The key characteristics are a) message
orientation, b) reliable
delivery c) multiple known source (not unlimited unknown) and d)
multiple in-flight
messages with the ULP being responsible for flow control.
 

	 
	 
	 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051108/017ee6d1/attachment.html>


More information about the general mailing list