[openib-general] [PATCH][iWARP] Added provider CM verbs and queryprovider methods

Tom Tucker tom at ammasso.com
Thu Aug 25 07:22:44 PDT 2005


I apologize in advance for the long-winded diatribe below...

Sean Hefty wrote:

 >>>Why include the connection protocol as part of the verbs
 >>>layer?  Granted I
 >>>haven't looked at the iWarp specs in a long time, but I don't remember
 >>>connection establishment being part of the verbs.
 >>>
 >>>
 >>Connection management is not part of the RDMAC verbs, however, we need
 >>some
 >>way for transports to "hook in" to the CM. The other approach is to have
 >>a separate registration mechanism for connection management verbs, but
 >>this seemed a little bizarre, so we just extended the provider verbs.
 >>
 >>Ideas?
 >>
 >>
 >
 >I need to reacquaint myself with iWarp more, but I don't like the idea 
of adding
 >CM calls as part of the verbs API, and in particular as part of a 
generic RDMA
 >device structure.
 >
 >
It's definitely a wart from the perspective of IB and/or RNIC Verbs. The
reason is that the verbs paradigm was founded (er confounded) on the
premise that connection management is implemented separate from DTO on
the transport. For IB this has been done and the "verbs" export a
mechansim to send the special MAD messages necessary to build a
connection (don't mean to preach what you know better than me -- just
setting the stage) By contrast, in every TCP implementation I've seen,
connection management is tightly coupled with the transport
implementation itself. In fact the connection management state machine
is part of a big function called tcp_output that is used to send *all*
data on the transport. The intricacies of route lookups and address
resolution are completely hidden in the stack. This is a very long
winded way of saying that it is "technically possible", but practically
complicated to achieve this separation in the implementation.

The DAT people "achieved" this by ignoring the problem altogher, i.e.
leaving it to the transport provider library. The Windows Chimney people
"achieved" this by having two monolithic, integrated stacks that could
exchange the connection state once established. The first stack (native
host or "connection management" stack) establishes the connection, then
the offload stack (hw in the adapter) takes over for DTO. Unfortunately,
the "miracle occurs" step where the two stacks exchange state, inflight
data, timer status etc.... is the subject of great debate and takes us
back to where we are right now -- busted.

Soo. The "solution" we proposed was simply to add the high level
connection management "verbs" to the driver so they could be called by
the upper level transport independent CM or by the ULP directly.   The
alternative is to create analogs of the IB connection process for TCP/IP
in the core. For example:

- The ib_at_route_by_ip function is fine for returning the route for an
IP connection (need this anyway).
- There is no path record for IP
- Create a pseudo-ARP request message that is submitted to a psuedo-mad
interface to TCP verbs.
- Create some goofy pseudo-connect-request message format that gets
submitted to a pseudo-mad interface to the TCP verbs.
- Create another goofy pseudo-connect-accept request message format that
gets submitted to the pseudo-mad interface

Don't know if this is exactly right, but you get the idea...The
pseudo-mad interface for iwarp would interpret these formats and then --
guess what -- do a connect or accept as the interface is now defined.

This seems to me to be a very complicated, slow and bug prone way to
make TCP connection establishment look similar to the IB process so that
we can hide it once again under a transport independent CM.

My 2 cents.

 >Does this suggest that each iWarp device driver will need to implement a
 >connection establishment protocol?  Isn't there a way to generalize 
that into a
 >single iWarp CM module that can sit above multiple devices?  How will
 >connections between different devices be supported?
 >
 >I'm assuming that the Linux kernel will never permit an established 
connection
 >to be offloaded onto a NIC.  However it seems possible that a new iWarp
 >connection could be done in a common way, with the result passed into 
the device
 >through the modify QP call as the LLP stream.
 >
 >- Sean
 >
 >
 >







More information about the general mailing list