[openib-general] possible cma bug

Steve Wise swise at opengridcomputing.com
Thu Jan 26 11:04:28 PST 2006


Sean,

While debugging some iwarp connection setup problems, I _might_ have
stumbled onto a cma bug.

I'm running the kernel cmatose.  The server side gets a connect request,
but the init_node() returns an error because the qp create fails.  The
cmatose module then rejects the connect request on that connect request
upcall.  Concurrently (on the main work thread running run_server()),
cmatose calls rdma_destroy_id() on the listening id.  The destroy
happens before the connect request upcall thread finishes (SMP :).  

Then as the other thread doing the connection request upcall unwinds the
stack and finishes processing in iw_conn_req_handler(), the system
Oopses in cma_release_remove() at line 1048 (with the iwarp cma patch).
I think the oops is because the listen_id was already destroyed, and
iw_conn_req_handler() didn't have a refence to it.  So the
cma_release_remove() code is touching freed memory.

I _think_ the solution is to bump the listen_id refcnt at the top of
cma_req_handler() and iw_conn_req_handler(), and do a cma_deref_id() on
the listen_id at the end of the functions.

I added this logic to the iwarp side and it appears to have fixed the
problem. 

Steve.







More information about the general mailing list