[ofa-general] What is the size of async event queue ?

Tang, Changqing changquing.tang at hp.com
Mon Mar 5 08:17:18 PST 2007


Or:
	Thank you for the description. I have read the spec carefully
and got some idea. But here is a case I don't know.

	I have 1024 QPs on a single port/cable. There is NO receive
posted because I use pure RDMA write. And also there is no pending send.
At this point I pull the cable out.

	I will get the port error event(right ?). Do I also get 1024 QP
error events ? Because there is no way to report through completion
status. Or the QPs are still in good state even though I pull out cable
?


--CQ

 

> -----Original Message-----
> From: Or Gerlitz [mailto:or.gerlitz at gmail.com] 
> Sent: Monday, March 05, 2007 3:37 AM
> To: Tang, Changqing
> Cc: Roland Dreier; openib-general at openib.org
> Subject: Re: [ofa-general] What is the size of async event queue ?
> 
> On 3/2/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> 
> >         What is the default size of the async event queue ? 
>  Suppose I 
> > create 1024 QP from one process to another process, Somehow 
> the remote 
> > process crashes, Can I get all the 1024 QP error async 
> event, how do I 
> > make sure I don't loss an event ?
> 
> CQ,
> 
> I want to understand what is the exact fearure you need.
> 
> for example, if TCP is used the equivalent of this is that 
> following a remote process crash the remote node/s TCP stack 
> close the TCP connections and when ever the local process 
> attempts to use the socket it get an errno telling this 
> connection was closed ?!
> 
> Since you use RC QP, --if-- you attempt doing post_send (or 
> rdma) to a QP whose connected peer QP is not responding, you 
> will get CQ completion with "retry exceeded" error.
> 
> If the above case (notification following post send) is not 
> enough, the IB CM which you can use through libibcm or 
> librdmacm provides the same functionality (sends DREQ if the 
> process crashes) with the distinction that over TCP the same 
> primitive (socket) is use for conn management and conn data 
> xfer, where over IB, the QP is used for data and the IB CM Id 
> (or the RDMA CM Id) is used for conn management.
> 
> Combining possibilities: if you want to get a notification on 
> every peer process crash, you would need to either 
> poll/select once a while the libibcm/librdmacm event queue or 
> implement some keep a live of your own protocol. For 
> instance, I think the IB spec mentions doing zero length rdma 
> write once in a while as a mean for implementing such protocol.
> 
> Or.
> 




More information about the general mailing list