[openib-general] possible cma bug
swise at opengridcomputing.com
Thu Jan 26 12:25:42 PST 2006
ok. I'll try refing the iw_cm_id before upcalling into the cma. That
should have the same effect.
On Thu, 2006-01-26 at 11:15 -0800, Sean Hefty wrote:
> Steve Wise wrote:
> > I'm running the kernel cmatose. The server side gets a connect request,
> > but the init_node() returns an error because the qp create fails. The
> > cmatose module then rejects the connect request on that connect request
> > upcall. Concurrently (on the main work thread running run_server()),
> > cmatose calls rdma_destroy_id() on the listening id. The destroy
> > happens before the connect request upcall thread finishes (SMP :).
> rdma_destroy_id() must block while there is a callback outstanding against the
> id being destroyed. This is true when pushed down to the lower level IB/iWarp
> code. The reference counting needs to be handled by the module invoking the
> callback, not the module being called.
> Imagine if the callback is about to be invoked when the user calls
> rdma_destroy_id(). rdma_destroy_id() completes, the user releases their memory,
> then the callback hits there code. The user's context is already invalid, so a
> reference doesn't help.
> > I _think_ the solution is to bump the listen_id refcnt at the top of
> > cma_req_handler() and iw_conn_req_handler(), and do a cma_deref_id() on
> > the listen_id at the end of the functions.
> See above. If the listen_id can be destroyed while the user is in
> cma_req_handler(), then it can be destroyed before the reference can be taken.
> Does the iWarp CM take a reference on the corresponding listen_id before
> invoking a connect request callback?
> - Sean
More information about the general