[openib-general] RDMA connection and address translation API

Steve Wise swise at ammasso.com
Wed Aug 24 07:59:12 PDT 2005


Roland, this looks good!  A few comments below...

 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier
> Sent: Wednesday, August 24, 2005 12:07 AM
> To: openib-general at openib.org
> Subject: [openib-general] RDMA connection and address translation API
> 
> At the OpenIB workshop on Monday, we had some discussion about a
> high-level transport-neutral API for connection handling.  After
> giving the topic some more thought, I've come to the conclusion that
> neither the kDAPL API nor the new API that was presented are usable.
> In this email, I'll try to detail my reasoning and sketch what I
> believe is the correct API.
> 
> The new API that we looked at was essentially the following (I'm
> recreating this from memory, so I apologize if I misrepresent it):
> 
>     listen(local_ip_address, service_id, listen_callback)
>     connect(local_qp, remote_ip_address, qos, service_id,
>             private_data, connect_callback)
> 
> We already discussed the problem with having the listen callback pass
> the consumer a remote source address -- doing this requires the
> connection handling module to do an ATS reverse lookup in the IB case,
> which the consumer might not want.  I think there's agreement that the
> correct thing here is for the listen callback to pass a transport
> address to the consumer and provide a function that the consumer can
> call to perform an ATS reverse lookup if desired.  This isn't a major
> problem and can be dealt with.
> 
> However, there's another problem with trying to lump address
> translation and connection into a single "connect" call, and this
> problem looks fundamental and fatal to me.  The connect call takes a
> QP pointer, but to create a QP the consumer needs to know which local
> device to use.  However, the consumer doesn't know which device to use
> until the destination address has been resolved to a route, including
> a local interface.
> 
> As far as I can tell, kDAPL punts on this and simply requires the
> consumer to handle the route lookup itself before calling
> dat_ep_connect().  It seems that current kDAPL consumers similarly
> punt on this issue: the iSER initiator and the NFS-RDMA client both
> just use a single device which is statically discovered at init time.
>

Yes, DAPL punts on this.

> It seems that the kDAPL connection model has a serious flaw, in that
> it pushes the complexity of route lookup into the consumer.  Further,
> we have strong evidence that this routing code is hard to write and
> that consumers will just ignore this complexity and hard-code
> solutions that don't work under all configurations.
>

I agree!
 
> With this in mind, I believe that the connection API needs to be
> something more like the following:
> 
>     rdma_resolve_address():
>         inputs: dest IP address, qos, npaths,
>             done callback, opaque context
> 	done callback params: status, local RDMA device,
>             RDMA transport address, context
> 
>         This function starts the process of resolving an IP address to
>         an RDMA device and address.  When the resolution is complete,
>         the callback is called with a status.  If the status is
>         "success" then the callback also gets the device pointer and
>         transport address (as well as the original context that the
>         consumer passed in).
> 
>         The "RDMA transport address" type is a union containing
>         transport-dependent data.  In the IB case, it's all of the
>         SGID, DGID, SLID, DLID, SL etc. that we know and love.  In the
>         iWARP case, it's the source IP, destination IP and QOS.
> 
>         npaths can be either 1 or 2 in the IB case; if it's 2, then
>         the resolver will try to find a primary and alternate path for
>         APM.  In the iWARP case, I guess npaths will always be 1, and
>         I guess anyone who wants to use iWARP over multihomed SCTP
>         will probably have to use some lower-level API.
> 
>         By the way, we may also have to have the option of passing in
>         a local netdev so that we can handle link-local IPv6
>         addresses.  There may be other cases I haven't thought of yet.
>         I just hope we can avoid going all the way to the horror of
>         the getaddrinfo() API.
> 
>         I also hope we can agree to use IPoIB ARP to resolve the
>         address in the IB case; having a flag or some other hack in
>         the API to expose the option of ATS seems unacceptably ugly.
> 
>     rdma_connect():
>         inputs: local QP, RDMA transport address, destination service,
>             private data, timeout, event callback, opaque context
> 
>         This function takes the resolved address and actually 
> connects.
> 
>         I'm not sure how we want to abstract the IB service vs. iWARP
>         TCP port number difference.  I guess it's OK to have iWARP
>         consumers stick their (16-bit) port number in a 64-bit
>         parameter, even if it's not the prettiest API.
> 
> To head off the knee-jerk objection: this API does NOT require any
> transport-specific code in consumers (unless a particular consumer
> WANTS to look inside the RDMA transport address).  Code to connect
> would be as simple as:
> 
>     rdma_resolve_address(...);
>     /* wait for resolution */
>     ib_create_qp(...) /* use device pointer we got from 
> rdma_resolve_address() */
>     rdma_connect(...); /* pass transport address we got from 
> rdma_resolve_address() */
>     /* wait for connection to finish... */
> 
> The listen side is even simpler:
> 
>     rdma_listen():
>         inputs: local service, event callback, consumer context
> 
>         Wait for connection requests and pass events to the consumer's
>         callback.  I'm not sure if/home we want to support binding to
>         a particular IP address.  The current IB CM in Linux doesn't
>         support binding a listen to a single device or port, and even
>         if it did it's not clear how to handle binding to one IP
>         address when a port has more than one IP.
> 
>         I guess the event callback would receive a device pointer and
>         the same RDMA transport address union I talked about above
>         when discussing address resolution.
> 
>         It would be possible to have another function like
>         rdma_getpeername() that takes the transport address and
>         returns a source IP address.  In the IB case this would do an
>         ATS reverse lookup.  However, I hate this idea.  iSER already
>         uses the CM private data to pass the source IP in the IB case,
>         and I would much rather fix NFS/RDMA to do the same thing (so
>         we can just kill ATS as an address resolution method).
> 

I think we should allow an ULP application can listen on a specific IP
address and device, this is done often in servers to limit the scope of
the service.  I was thinking such a ULP would simply walk the list of ib
devices and issue a rdma_listen() on each device.  But the ULP should
also be able to pass down the local ipaddr upon which the device should
listen.  Consider an NFS/RDMA server ULP that has exports that are
limited to a single subnet or interface, etc...





More information about the general mailing list