[ofw] opensm stuck upon kill

Smith, Stan stan.smith at intel.com
Thu Feb 2 10:50:33 PST 2012


>-----Original Message-----
>From: Leonid Keller [mailto:leonid at mellanox.com]
>Sent: Thursday, February 02, 2012 8:42 AM
>To: Hefty, Sean; Tzachi Dar; Smith, Stan
>Cc: Uri Habusha; ofw_list; Irena Gannon
>Subject: RE: opensm stuck upon kill
>
>I do not have the crashed machine more.
>It was rebooted and the full dump creation failed.
>
>I can't say about MADs, but I found only one place where an AV is created and attached to PD - in the send_mad call.
>And I saw that PD has ref_cnt = 227.
>I think these are references of not released AVs i.e. MADs.
>
>Could you tell me where I can see not released MADs ?
>The stuck happened after WmProviderDeregister() and destroy_qp.
>WmProviderDeregister is to release all the queued MADs.
>Could there be some MADs that are already or yet not in the queue ?

Check opensm\user\libvendor\osm_vendor_ibumad.c

>
>-----Original Message-----
>From: Hefty, Sean [mailto:sean.hefty at intel.com]
>Sent: Thursday, February 02, 2012 6:28 PM
>To: Leonid Keller; Tzachi Dar; Smith, Stan
>Cc: Uri Habusha; ofw_list; Irena Gannon
>Subject: RE: opensm stuck upon kill
>
>> winmad!WmRegRemoveHandler+0xae is standing here:
>>
>> 	WmProviderDeregister(pRegistration->pProvider, pRegistration);
>> 	pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp,
>> NULL);
>> 	pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd,
>> NULL);
>> >	pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);
>>
>> Could you suggest some idea ?
>
>winmad does not explicitly allocate any address handles.  Can you tell if there are any mads which were not returned to the free pool?  You
>could try replacing the NULLs in the above code with ib_sync_destroy (unsure of exact name).



More information about the ofw mailing list