[ofa-general] Infiniband Problems

David Robb DavidRobb at comsci.co.uk
Wed Jun 27 15:16:09 PDT 2007


David Robb wrote:
>
> Roland Dreier wrote:
>>  > Quite possibly, we are using an IBV_QPT_RC transport type. The code
>>  > simply adds another work request with ibv_post_srq_recv(...) after
>>  > each packet is processed. Am I correct in thinking it should start 
>> out
>>  > with a stack of work requests in case another packet arrives before
>>  > the current one has been processed?
>>
>> That seems a lot more sensible to me.
Have now setup things as suggested and getting a very healthy transfer 
rate with minimal latencies. :-)
>>
>>  > Sorry, I meant to look up in my source code which call was failing 
>> but
>>  > forgot to paste it into the question. Yes, I can map 2GB of memory 
>> but
>>  > the call to ibv_create_qp() fails with REJ
>>
>> Not sure what you mean ... ibv_create_qp() just returns a pointer or
>> NULL.  What does it mean to "fail with REJ?"
>>   
> OK. I need to rerun this test tomorrow to determine exactly where and 
> how this test is failing. The end result is that the QP creation fails 
> with a REJ. From what I remember, I get a CM event  IB_CM_REJ_RECEIVED 
> and the remote node is not even aware that anything has tried to connect.
> Thanks for staying with me on this one.
Finally, tracked this one down to a problem in our App software. It was 
caused by a race condition between our Master instructing a Slave to 
initialise and register its service name and ID with the SA. The master 
would then attempt to create a QP with the slave, this would fail with a 
CM REJ event with reason code INVALID_SERVICE_ID. I guess that 
specifying a larger memory region was enough to increase the timing such 
that the SA was unaware of the slave node when creating the QP.
Anyway, a re-jig of our code now has now made this more robust and 
faster to create all the connections.
>>  > That's reassuring. Are there any performance penalties for mapping a
>>  > larger region than a smaller region?
>>
>> Not really beyond the general cost of using more memory rather than 
>> less.
>>   
Thanks for your help.

David Robb.



More information about the general mailing list