[ofa-general] Question about RDMA CM
Jeff Squyres
jsquyres at cisco.com
Wed Sep 17 05:15:13 PDT 2008
On Sep 17, 2008, at 1:45 AM, Doug Ledford wrote:
>> No; we are using ibv_create_qp, and then assigning id->qp afterwards.
>
> Don't do that. Assume if rdmacm provides an interface for doing
> something, then there is likely a reason. In this case, when you call
> rdma_create_qp(), it does more than just call ibv_create_qp() and
> ibv_modify_qp() on your behalf. It also pipes information about the
> state changes in the qp to the kernel rdma_cm module (by writing the
> commands in rdma_cm format to id->channel->fd, which is the rdma_cm fd
> not the qp fd, in places like rdma_init_qp_attr()).
In general, I agree with you (use rdmacm_create_qp instead of making
it manually). But I'm looking at the head of the librdmacm git and I
don't see what you're talking about. All it does is call
ibv_create_qp and ibv_modify_qp.
The person who initially wrote this code chose to create the qp
manually for two reasons:
- rdmacm_create_qp is just a wrapper around ibv_create_qp and
ibv_modify_qp
- other parts of OMPI (that don't use RDMA CM for wireup) call
ibv_create_qp and ibv_modify_qp
That being said, I actually spent a little time yesterday trying to
convert to use rdmacm_create_qp and was having problems with the
comparison at the top of rdmacm_create_qp against the protection
domain -- somehow it was failing for me. It was not immediately
obvious to me where rdmacm was getting that pd from, nor why that
comparison would fail.
>> As far as I can tell, I am not sending to the wrong QP. But it is
>> complex code, so there certainly can be a bug in this area.
>>
>> The thing that is weird for me is that setting rnr_retry to 7 makes
>> it
>> work.
>
> I didn't look into the kernel code so I couldn't venture a guess as to
> whether or not the above is actually a hard requirement, and whether
> or
> not it would explain the rnr_retry of 7 getting around the race
> condition, but I would think it's plausible.
If all is working properly, a rnr_retry_count of 0 should be
sufficient because there should be no race conditions. This is what
OMPI has had for years; it's only this new RDMA CM wireup problem that
has forced me to set it at 7.
However, as I mentioned before, this is complex code, so it's quite
possible (likely?) that I have a bug in the code somewhere. I was
posting here looking for any possible insights into why this could
happen.
--
Jeff Squyres
Cisco Systems
More information about the general
mailing list