[ofa-general] What is the size of async event queue ?

Or Gerlitz or.gerlitz at gmail.com
Mon Mar 5 01:37:45 PST 2007


added general at lists.openfabrics.org to the thread, sorry for the double post.

On 3/5/07, Or Gerlitz <or.gerlitz at gmail.com> wrote:
> On 3/2/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>
> >         What is the default size of the async event queue ?  Suppose I
> > create 1024 QP from one process to another process,
> > Somehow the remote process crashes, Can I get all the 1024 QP error
> > async event, how do I make sure I don't loss an event ?
>
> CQ,
>
> I want to understand what is the exact fearure you need.
>
> for example, if TCP is used the equivalent of this is that following a
> remote process crash the remote node/s TCP stack close the TCP
> connections and when ever the local process attempts to use the socket
> it get an errno telling this connection was closed ?!
>
> Since you use RC QP, --if-- you attempt doing post_send (or rdma) to a
> QP whose connected peer QP is not responding, you will get CQ
> completion with "retry exceeded" error.
>
> If the above case (notification following post send) is not enough,
> the IB CM which you can use through libibcm or librdmacm provides the
> same functionality (sends DREQ if the process crashes) with the
> distinction that over TCP the same primitive (socket) is use for conn
> management and conn data xfer, where over IB, the QP is used for data
> and the IB CM Id (or the RDMA CM Id) is used for conn management.
>
> Combining possibilities: if you want to get a notification on every
> peer process crash, you would need to either poll/select once a while
> the libibcm/librdmacm event queue or implement some keep a live of
> your own protocol. For instance, I think the IB spec mentions doing
> zero length rdma write once in a while as a mean for implementing such
> protocol.
>
> Or.
>



More information about the general mailing list