[openib-general] RDMA connection and address translation API

Fri Aug 26 15:31:11 PDT 2005

> -----Original Message-----
> From: Guy German [mailto:guyg at voltaire.com] 
> Sent: Friday, August 26, 2005 12:28 PM
> To: Caitlin Bestler; Sean Hefty; James Lentini
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] RDMA connection and address 
> translation API
> 
> > What do you think about this flow ? 
> > 1. resolve device and port from ip address - synchronous operation 
> >    (like at.c resolve_ip)
> > 2. rdma_create_qp (device+port) - modifies qp to init with default 
> > pkey index 3. ib_post_recvs(...); 4. cma_connect -  
> asynchronous at, 
> > modify qp with correct pkey index, cm_connect
> 
> Caitlin wrote:
> >At least with iWARP a QP is not bound to a specific port, or 
> even to an 
> >IP Address. It is only bound to the RDMA Device (RNIC) and 
> Protection 
> >Domain. The same QP can be re-used for a new connection with 
> a new IP 
> >address. Indeed, that is exactly what would happen with 
> >application-layer controlled failover (such as iSER).
> 
> In ib, in order to post receive the QP need to be in init.
> In order to modify qp to init, you need port and pkey_index.
> If iWARP can post receive without it, the iwarp 
> implementation of "rdma_create_qp" can ignore the port attribute.
> 

The closest equivalent of a pkey_index would be the VLAN ID, which
is at L2 and totally transparent to an iWARP QP. You can definitely
post receive buffers before knowing anything about the TCP connection
(or SCTP association/stream) that will provide the LLP service.

> The other option, that was suggested to solve the sync 
> problem (need of post receive before connect) is to retrieve 
> the path synchronically, which will require an unnecessary 
> upcall handling for iwarp consumers.
> 
The generic requirement is that the QP passed to the connect
method is ready to be moved to a connected state as soon as
the connection establishment exchanges have finished.

If I follow what you are proposing, you are trying to find a way
to do this for IB automatically as a by-product of determining what
device to use. I don't see any problem with this, as long as the
"port" being returned from the first call is defined in such a
way that it can have a void value when the transport does not need
this refinement. Avoiding transport-dependent steps is good for
encouraging development of RDMA-aware applications.