[openib-general] possible dapl bug?
Steve Wise
swise at opengridcomputing.com
Mon Apr 3 14:05:03 PDT 2006
Hey James,
I'm trying to get dapltest to run over the Chelsio RNIC in user mode.
I'm running into an intermittent failure where the server side fails to
properly clean up its resources. It has to do with disconnect vs ep
freeing (go figure :). Basically if the disconnect event handler thread
doesn't get done (and turn off in_callback) before the main dapltest
thread attempts to destroy the EP, then dat_ep_free() will return "ok I
freed it" even though it doesn't because in_callback == 1 in the
dapl_cm_id struct. The main thread then tries to free the EVD and PZ
and gets errors because they are still in use.
So dapli_destroy_conn() defers destroying the ib_qp if !
conn->in_callback. This, however, leads to the dapltest program trying
to destroy CQs and PDs with a QP still attached to them.
I'll keep poking into this, but I wanted to bring this to your attention
now. I haven't seen this on IB, but I hit it regularly on iwarp. But
it seems like a bug since dat_ep_free() returns that the ep was
destroyed and it really wasn't...
Steve.
--------- trace of server side dapltest ---------
Test[b0df]: End Successfully
dapl_lmr_free (0x8084598)
dapl_lmr_free (0x808b758)
dapl_lmr_free (0x8085e78)
dapl_ep_disconnect (0x8083f00, 0)
disconnect(ep 0x8083f00, conn 0x80872a0, id 134749624 flags 0)
ib_thread(7575) poll_event: async=0x0 pipe=0x0 cm=0x1 cq=0x0
cm_event()
cm_event: EVENT=10 ID=0x8081db8 LID=0x4015aae8 CTX=0x80872a0
passive_cb: conn 0x80872a0 id 134749624 event 10
--> dapl_cr_callback! context: 0x8085f28 event: 1 cm_handle 0x80872a0
dapls_ib_get_dat_event: event(passive) ib=0x1 dat=0x4005
--> dapli_get_sp_ep! disconnect dump sp: 0x8085f28
remove_listen(ia_ptr 0x8077220 sp_ptr 0x8085f28 cm_ptr 0x8085f98)
destroy_conn: conn 0x8085f98 id 134766784
--> dapl_evd_connection_callback: ctxt: 0x8083f00 event: 1 cm_handle 0x80872a0
dapls_ib_get_dat_event: event(passive) ib=0x1 dat=0x4005
dapli_evd_post_event: Called with event # 4005
dapl_evd_connection_callback () returns
dapl_ep_disconnect () returns 0x0
dapl_evd_wait (0x8082da0, -1, 1, 0x403b48c0, 0x403b4894)
dapl_evd_wait: EVD 0x8082da0, CQ (nil)
dapl_evd_wait () returns 0x0
dapl_evd_dequeue (0x8081ea8, 0x403b4940)
dapl_evd_dequeue () returns 0xd0000
dapl_ep_free (0x8083f00)
dapl_ep_disconnect (0x8083f00, 0)
dapl_ep_disconnect () returns 0x0
dapl_ep_free: Free EP: b, ep 0x8083f00 qp_state 1 qp_handle 80872a0
qp_free: ep_ptr 0x8083f00 qp 0x80872a0
destroy_conn: conn 0x80872a0 id 134749624
dapli_destroy_conn IN CALLBACK!
dapl_evd_free (0x8082da0)
dapl_evd_free () returns 0x0
dapl_evd_free (0x8082b98)
dapl_evd_free () returns 0x0
dapl_evd_free (0x8082520)
destroy_cq Device or resource busy
dapl_evd_free () returns 0x110000
Test[b0df]: dat_evd_free (reqt) error: DAT_PROVIDER_IN_USE
dapl_evd_free (0x8081ea8)
destroy_cq Device or resource busy
dapl_evd_free () returns 0x110000
Test[b0df]: dat_evd_free (recv) error: DAT_PROVIDER_IN_USE
dapl_pz_free (0x807a390)
dealloc_pd Device or resource busy
Test[b0df]: dat_pz_free error: DAT_PROVIDER_IN_USE
Test[b0df]: cleanup is done
passive_cb: destroy 1 in_callback 1
cma_cb: DESTROY conn 0x80872a0 cm_id 0x8081db8 qp 0x8084790
Server: Transaction Test Finished for this client
More information about the general
mailing list