[ofa-general] Question about RDMA CM

Jeff Squyres jsquyres at cisco.com
Wed Sep 17 05:15:13 PDT 2008


On Sep 17, 2008, at 1:45 AM, Doug Ledford wrote:

>> No; we are using ibv_create_qp, and then assigning id->qp afterwards.
>
> Don't do that.  Assume if rdmacm provides an interface for doing
> something, then there is likely a reason.  In this case, when you call
> rdma_create_qp(), it does more than just call ibv_create_qp() and
> ibv_modify_qp() on your behalf.  It also pipes information about the
> state changes in the qp to the kernel rdma_cm module (by writing the
> commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd
> not the qp fd, in places like rdma_init_qp_attr()).

In general, I agree with you (use rdmacm_create_qp instead of making  
it manually).  But I'm looking at the head of the librdmacm git and I  
don't see what you're talking about.  All it does is call  
ibv_create_qp and ibv_modify_qp.

The person who initially wrote this code chose to create the qp  
manually for two reasons:

- rdmacm_create_qp is just a wrapper around ibv_create_qp and  
ibv_modify_qp
- other parts of OMPI (that don't use RDMA CM for wireup) call  
ibv_create_qp and ibv_modify_qp

That being said, I actually spent a little time yesterday trying to  
convert to use rdmacm_create_qp and was having problems with the  
comparison at the top of rdmacm_create_qp against the protection  
domain -- somehow it was failing for me.  It was not immediately  
obvious to me where rdmacm was getting that pd from, nor why that  
comparison would fail.

>> As far as I can tell, I am not sending to the wrong QP.  But it is
>> complex code, so there certainly can be a bug in this area.
>>
>> The thing that is weird for me is that setting rnr_retry to 7 makes  
>> it
>> work.
>
> I didn't look into the kernel code so I couldn't venture a guess as to
> whether or not the above is actually a hard requirement, and whether  
> or
> not it would explain the rnr_retry of 7 getting around the race
> condition, but I would think it's plausible.


If all is working properly, a rnr_retry_count of 0 should be  
sufficient because there should be no race conditions.  This is what  
OMPI has had for years; it's only this new RDMA CM wireup problem that  
has forced me to set it at 7.

However, as I mentioned before, this is complex code, so it's quite  
possible (likely?) that I have a bug in the code somewhere.  I was  
posting here looking for any possible insights into why this could  
happen.

-- 
Jeff Squyres
Cisco Systems




More information about the general mailing list