[ofa-general] What is the size of async event queue ?
Tang, Changqing
changquing.tang at hp.com
Mon Mar 5 08:17:18 PST 2007
Or:
Thank you for the description. I have read the spec carefully
and got some idea. But here is a case I don't know.
I have 1024 QPs on a single port/cable. There is NO receive
posted because I use pure RDMA write. And also there is no pending send.
At this point I pull the cable out.
I will get the port error event(right ?). Do I also get 1024 QP
error events ? Because there is no way to report through completion
status. Or the QPs are still in good state even though I pull out cable
?
--CQ
> -----Original Message-----
> From: Or Gerlitz [mailto:or.gerlitz at gmail.com]
> Sent: Monday, March 05, 2007 3:37 AM
> To: Tang, Changqing
> Cc: Roland Dreier; openib-general at openib.org
> Subject: Re: [ofa-general] What is the size of async event queue ?
>
> On 3/2/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>
> > What is the default size of the async event queue ?
> Suppose I
> > create 1024 QP from one process to another process, Somehow
> the remote
> > process crashes, Can I get all the 1024 QP error async
> event, how do I
> > make sure I don't loss an event ?
>
> CQ,
>
> I want to understand what is the exact fearure you need.
>
> for example, if TCP is used the equivalent of this is that
> following a remote process crash the remote node/s TCP stack
> close the TCP connections and when ever the local process
> attempts to use the socket it get an errno telling this
> connection was closed ?!
>
> Since you use RC QP, --if-- you attempt doing post_send (or
> rdma) to a QP whose connected peer QP is not responding, you
> will get CQ completion with "retry exceeded" error.
>
> If the above case (notification following post send) is not
> enough, the IB CM which you can use through libibcm or
> librdmacm provides the same functionality (sends DREQ if the
> process crashes) with the distinction that over TCP the same
> primitive (socket) is use for conn management and conn data
> xfer, where over IB, the QP is used for data and the IB CM Id
> (or the RDMA CM Id) is used for conn management.
>
> Combining possibilities: if you want to get a notification on
> every peer process crash, you would need to either
> poll/select once a while the libibcm/librdmacm event queue or
> implement some keep a live of your own protocol. For
> instance, I think the IB spec mentions doing zero length rdma
> write once in a while as a mean for implementing such protocol.
>
> Or.
>
More information about the general
mailing list