[ofa-general] IB post send lost.

Dotan Barak dotanb at dev.mellanox.co.il
Wed Nov 7 22:07:41 PST 2007


Hi.

Bharath Ramesh wrote:
> I have a multi-threaded application. My application has its own message
> exchange protocol, it uses IB as the communication layer. I send a lot
> of messages which are normally of the order of few ten thousands. After
> sometime it seems like one message from one of the node is lost. I am
> using RC QP type. This causes the thread to deadlock. The other threads
> are still able to communicate exchanging messages without any problem
> over the same QP. Both ends are using SRQs and there is sufficient
> buffers posted so that I dont run out of buffers. I even tried doubling
> the buffers posted I see the same problem again. One message being lost.
> The ibv_post_send doesnt report any error. I am trying to get this done
> for a conference deadline early next week. I would really appreciate any
> help in suggesting any possibilities which might cause the message to be
> dropped without any error being returned.
>   
If you don't have any bugs in your code, the described scenario should work.

I need some more info in order to try to help you:

Do you use the same QP from several threads (and post send from all of 
them)?
How do you poll the CQ (several threads/one)?

which HW/SW do you use?

thanks
Dotan



More information about the general mailing list