[ofa-general] Question about RDMA CM

Doug Ledford dledford at redhat.com
Tue Sep 16 22:45:23 PDT 2008


On Tue, 2008-09-16 at 16:05 -0400, Jeff Squyres wrote:
> On Sep 16, 2008, at 3:23 PM, Davis, Arlin R wrote:
> 
> >> 2. Due to some uninteresting race conditions, we only allow
> >> connections to be made "in one direction" (the lower (IP address,
> >> port) tuple is the initiator).  If the "wrong" MPI process desires to
> >> make a connection, it makes a bogus QP and initiates an
> >> rdma_connect().  The receiver process then gets the CONNECT_REQUEST
> >> event, detects that the connection is coming the "wrong" way,
> >> initiates the connection in the "right" direction, and then rejects
> >> the "wrong" connection.  The initiator expects the rejection, and
> >> simply waits for the CONNECT_REQUEST coming in the other direction.
> >
> > Do you use rdma_cm to create QP's?
> 
> No; we are using ibv_create_qp, and then assigning id->qp afterwards.

Don't do that.  Assume if rdmacm provides an interface for doing
something, then there is likely a reason.  In this case, when you call
rdma_create_qp(), it does more than just call ibv_create_qp() and
ibv_modify_qp() on your behalf.  It also pipes information about the
state changes in the qp to the kernel rdma_cm module (by writing the
commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd
not the qp fd, in places like rdma_init_qp_attr()).

> As far as I can tell, I am not sending to the wrong QP.  But it is  
> complex code, so there certainly can be a bug in this area.
> 
> The thing that is weird for me is that setting rnr_retry to 7 makes it  
> work.

I didn't look into the kernel code so I couldn't venture a guess as to
whether or not the above is actually a hard requirement, and whether or
not it would explain the rnr_retry of 7 getting around the race
condition, but I would think it's plausible.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080917/540c64c8/attachment.sig>


More information about the general mailing list