[ofa-general] Infiniband Problems
David Robb
DavidRobb at comsci.co.uk
Wed Jun 27 15:16:09 PDT 2007
David Robb wrote:
>
> Roland Dreier wrote:
>> > Quite possibly, we are using an IBV_QPT_RC transport type. The code
>> > simply adds another work request with ibv_post_srq_recv(...) after
>> > each packet is processed. Am I correct in thinking it should start
>> out
>> > with a stack of work requests in case another packet arrives before
>> > the current one has been processed?
>>
>> That seems a lot more sensible to me.
Have now setup things as suggested and getting a very healthy transfer
rate with minimal latencies. :-)
>>
>> > Sorry, I meant to look up in my source code which call was failing
>> but
>> > forgot to paste it into the question. Yes, I can map 2GB of memory
>> but
>> > the call to ibv_create_qp() fails with REJ
>>
>> Not sure what you mean ... ibv_create_qp() just returns a pointer or
>> NULL. What does it mean to "fail with REJ?"
>>
> OK. I need to rerun this test tomorrow to determine exactly where and
> how this test is failing. The end result is that the QP creation fails
> with a REJ. From what I remember, I get a CM event IB_CM_REJ_RECEIVED
> and the remote node is not even aware that anything has tried to connect.
> Thanks for staying with me on this one.
Finally, tracked this one down to a problem in our App software. It was
caused by a race condition between our Master instructing a Slave to
initialise and register its service name and ID with the SA. The master
would then attempt to create a QP with the slave, this would fail with a
CM REJ event with reason code INVALID_SERVICE_ID. I guess that
specifying a larger memory region was enough to increase the timing such
that the SA was unaware of the slave node when creating the QP.
Anyway, a re-jig of our code now has now made this more robust and
faster to create all the connections.
>> > That's reassuring. Are there any performance penalties for mapping a
>> > larger region than a smaller region?
>>
>> Not really beyond the general cost of using more memory rather than
>> less.
>>
Thanks for your help.
David Robb.
More information about the general
mailing list