[ofa-general] IB post send lost.

Dotan Barak dotanb at dev.mellanox.co.il
Thu Nov 8 06:56:35 PST 2007


Hi.

i need some more info.

Which IB HW do you use?
(you can get this info from ibv_devinfo)

Which IB SW do you use?
(you can get this info from ofed_info)


Dotan

Bharath Ramesh wrote:
> * Dotan Barak (dotanb at dev.mellanox.co.il) wrote:
>   
>> Hi.
>>
>> Bharath Ramesh wrote:
>>     
>>> I have a multi-threaded application. My application has its own message
>>> exchange protocol, it uses IB as the communication layer. I send a lot
>>> of messages which are normally of the order of few ten thousands. After
>>> sometime it seems like one message from one of the node is lost. I am
>>> using RC QP type. This causes the thread to deadlock. The other threads
>>> are still able to communicate exchanging messages without any problem
>>> over the same QP. Both ends are using SRQs and there is sufficient
>>> buffers posted so that I dont run out of buffers. I even tried doubling
>>> the buffers posted I see the same problem again. One message being lost.
>>> The ibv_post_send doesnt report any error. I am trying to get this done
>>> for a conference deadline early next week. I would really appreciate any
>>> help in suggesting any possibilities which might cause the message to be
>>> dropped without any error being returned.
>>>   
>>>       
>> If you don't have any bugs in your code, the described scenario should 
>> work.
>>
>> I need some more info in order to try to help you:
>>
>> Do you use the same QP from several threads (and post send from all of 
>> them)?
>>     
>
> Yes, I use the same the QP from three threads. The application has close
> to 5 threads. The receives are handled by a single thread. Most of the
> sends are posted by a single thread. Occasionally a third thread posts a
> few sends to the QP. The same QP is also used for RDMA Writes. Majority
> of the RDMA Writes are also performed by the same thread that posts
> majority of the send messages.
>
>   
>> How do you poll the CQ (several threads/one)?
>>     
>
> I have two CQs, one for receive and the other for send. The receive CQ
> is polled only by the receive thread. The send CQ is polled by the three
> threads. Occasionally by the receiver thread to clear out an send CQEs
> because I use IBV_SEND_SIGNALED for every 16 IBV_SEND_INLINEs. Otherwise
> the send CQ is polled by the single thread that does majority of the
> sends. Occasionally the third thread when doing a send might poll the
> send CQ as well for completion CQE in case of a RDMA Write.
>
>   
>> which HW/SW do you use?
>>     
>
> I am using Yellow Dog Linux 5.0 on Apple Xserves.
>
> Thanks,
>
> Bharath
>
> ---
> Bharath Ramesh       <bramesh at vt.edu>       http://people.cs.vt.edu/~bramesh
>
>
>   




More information about the general mailing list