[ofa-general] IB post send lost.

Bharath Ramesh bramesh at vt.edu
Wed Nov 7 16:28:31 PST 2007


I have a multi-threaded application. My application has its own message
exchange protocol, it uses IB as the communication layer. I send a lot
of messages which are normally of the order of few ten thousands. After
sometime it seems like one message from one of the node is lost. I am
using RC QP type. This causes the thread to deadlock. The other threads
are still able to communicate exchanging messages without any problem
over the same QP. Both ends are using SRQs and there is sufficient
buffers posted so that I dont run out of buffers. I even tried doubling
the buffers posted I see the same problem again. One message being lost.
The ibv_post_send doesnt report any error. I am trying to get this done
for a conference deadline early next week. I would really appreciate any
help in suggesting any possibilities which might cause the message to be
dropped without any error being returned.

Thanks,

Bharath

---
Bharath Ramesh       <bramesh at vt.edu>       http://people.cs.vt.edu/~bramesh




More information about the general mailing list