[ofa-general] What is the size of async event queue ?

Or Gerlitz or.gerlitz at gmail.com
Mon Mar 5 01:36:34 PST 2007


On 3/2/07, Tang, Changqing <changquing.tang at hp.com> wrote:

>         What is the default size of the async event queue ?  Suppose I
> create 1024 QP from one process to another process,
> Somehow the remote process crashes, Can I get all the 1024 QP error
> async event, how do I make sure I don't loss an event ?

CQ,

I want to understand what is the exact fearure you need.

for example, if TCP is used the equivalent of this is that following a
remote process crash the remote node/s TCP stack close the TCP
connections and when ever the local process attempts to use the socket
it get an errno telling this connection was closed ?!

Since you use RC QP, --if-- you attempt doing post_send (or rdma) to a
QP whose connected peer QP is not responding, you will get CQ
completion with "retry exceeded" error.

If the above case (notification following post send) is not enough,
the IB CM which you can use through libibcm or librdmacm provides the
same functionality (sends DREQ if the process crashes) with the
distinction that over TCP the same primitive (socket) is use for conn
management and conn data xfer, where over IB, the QP is used for data
and the IB CM Id (or the RDMA CM Id) is used for conn management.

Combining possibilities: if you want to get a notification on every
peer process crash, you would need to either poll/select once a while
the libibcm/librdmacm event queue or implement some keep a live of
your own protocol. For instance, I think the IB spec mentions doing
zero length rdma write once in a while as a mean for implementing such
protocol.

Or.




More information about the general mailing list