[openib-general] possible cma bug

Sean Hefty mshefty at ichips.intel.com
Thu Jan 26 11:15:41 PST 2006


Steve Wise wrote:
> I'm running the kernel cmatose.  The server side gets a connect request,
> but the init_node() returns an error because the qp create fails.  The
> cmatose module then rejects the connect request on that connect request
> upcall.  Concurrently (on the main work thread running run_server()),
> cmatose calls rdma_destroy_id() on the listening id.  The destroy
> happens before the connect request upcall thread finishes (SMP :).  

rdma_destroy_id() must block while there is a callback outstanding against the 
id being destroyed.  This is true when pushed down to the lower level IB/iWarp 
code.  The reference counting needs to be handled by the module invoking the 
callback, not the module being called.

Imagine if the callback is about to be invoked when the user calls 
rdma_destroy_id().  rdma_destroy_id() completes, the user releases their memory, 
then the callback hits there code.  The user's context is already invalid, so a 
reference doesn't help.

> I _think_ the solution is to bump the listen_id refcnt at the top of
> cma_req_handler() and iw_conn_req_handler(), and do a cma_deref_id() on
> the listen_id at the end of the functions.

See above.  If the listen_id can be destroyed while the user is in 
cma_req_handler(), then it can be destroyed before the reference can be taken.

Does the iWarp CM take a reference on the corresponding listen_id before 
invoking a connect request callback?

- Sean



More information about the general mailing list