[openib-general] possible cma bug

Steve Wise swise at opengridcomputing.com
Thu Jan 26 12:25:42 PST 2006


ok.  I'll try refing the iw_cm_id before upcalling into the cma.  That
should have the same effect.




On Thu, 2006-01-26 at 11:15 -0800, Sean Hefty wrote:
> Steve Wise wrote:
> > I'm running the kernel cmatose.  The server side gets a connect request,
> > but the init_node() returns an error because the qp create fails.  The
> > cmatose module then rejects the connect request on that connect request
> > upcall.  Concurrently (on the main work thread running run_server()),
> > cmatose calls rdma_destroy_id() on the listening id.  The destroy
> > happens before the connect request upcall thread finishes (SMP :).  
> 
> rdma_destroy_id() must block while there is a callback outstanding against the 
> id being destroyed.  This is true when pushed down to the lower level IB/iWarp 
> code.  The reference counting needs to be handled by the module invoking the 
> callback, not the module being called.
> 
> Imagine if the callback is about to be invoked when the user calls 
> rdma_destroy_id().  rdma_destroy_id() completes, the user releases their memory, 
> then the callback hits there code.  The user's context is already invalid, so a 
> reference doesn't help.
> 
> > I _think_ the solution is to bump the listen_id refcnt at the top of
> > cma_req_handler() and iw_conn_req_handler(), and do a cma_deref_id() on
> > the listen_id at the end of the functions.
> 
> See above.  If the listen_id can be destroyed while the user is in 
> cma_req_handler(), then it can be destroyed before the reference can be taken.
> 
> Does the iWarp CM take a reference on the corresponding listen_id before 
> invoking a connect request callback?
> 
> - Sean




More information about the general mailing list