[ofw] CM ref counting issues...
Sean Hefty
sean.hefty at intel.com
Mon Dec 7 12:19:24 PST 2009
Running ndconn over winverbs results in the IB CM hanging. I'm running the
client and server on a single system, and both sides hang during cleanup. It's
possible there is a problem in the winverbs driver, but it looks more like some
issue in ibal to me.
I modified cm_destroy_id and kal_cep_destroy to display some information when
the hang occurs:
cm_destroy_id()
calls kal_cep_destroy
wait 10 seconds for the destroy callback
if wait timed out display the cid
free the cid
kal_cep_destroy()
locks
ref_cnt = __cleanup_cep() // decrements ref_cnt
unlock
if ref_cnt > 0 then display cid information
^^^ this isn't necessarily an error
Here's a snapshot the output:
...
kal_cep_destroy FFFFFA8004A93BB0 0x318 ref 0x2 signal 0
kal_cep_destroy FFFFFA80045A7750 0x71 ref 0x2 signal 0
^^^ these are okay, cm_destroy_id completes normally.
signal = 0 means that we're not in a callback for the cid, so
both references are the result of outstanding MADs
kal_cep_destroy FFFFFA80044F3BB0 0x185 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x185
^^^ This and below indicate a referring counting issue.
The extra reference should be an outstanding MAD that never
releases its reference. (Increasing the wait timeout doesn't help.)
kal_cep_destroy FFFFFA8005446420 0x7c2 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x7c2
kal_cep_destroy FFFFFA80054205D0 0x7ab ref 0x1 signal 0
kal_cep_destroy FFFFFA800532ABB0 0x73f ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x7ab
kal_cep_destroy FFFFFA800521A2F0 0x6b0 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x73f
kal_cep_destroy FFFFFA800480FBB0 0x1f8 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x6b0
kal_cep_destroy FFFFFA80052D3010 0x711 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x1f8
kal_cep_destroy FFFFFA8004FDB750 0x592 ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x711
kal_cep_destroy FFFFFA80053D9BB0 0x78a ref 0x1 signal 0
cm_destroy_id 0x102 cid 0x592
cm_destroy_id 0x102 cid 0x78a
More information about the ofw
mailing list