[ofw] Disconnection problem and AL reference

Tzachi Dar tzachid at mellanox.co.il
Wed Oct 15 10:01:11 PDT 2008


Can you please send us the program that you have been using?
 
Thanks
Tzachi


________________________________

	From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
	Sent: Wednesday, October 15, 2008 4:05 AM
	To: James Yang; ofw at lists.openfabrics.org
	Subject: RE: [ofw] Disconnection problem and AL reference
	
	

	Also there is an error message from windbg:

	 

	[MLX4_HCA] mlnx_query_ca() :***ERROR***  ib_query_device failed
(-16)

	 

	What's does this mean? I think the ca handle we opened never got
destroyed after we close it.

	 

	Thanks,

	James

	 

	
________________________________


	From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
	Sent: Tuesday, October 14, 2008 3:02 PM
	To: ofw at lists.openfabrics.org
	Subject: [ofw] Disconnection problem and AL reference

	 

	Hi,

	 

	Our driver product is based on WinOF1.1. Recently I saw a
problem that Windows cannot shut down. The procedure and observation are
as follows:

	 

	Install the driver, when there is still some traffic going on,
reboot the system.

	 

	We do the following in our driver, and everything seems working
until reboot.

	 

	*         create_cq() :   one receive queue and one send queue,
and set the callback function

	 

	*         create_qp() with the above created queues, and set
init state IB_QPS_INIT

	 

	*         cm_req() with the QP and correct connection path

	 

	*         post_recv() with 100 package buffer for receiving data

	 

	*         post_send() when necessary

	 

	 

	Receive and send are fine with the respective callback invoked,
whenever there is data activity.

	 

	At certain point during shutdown, when we try to do cm_dreq() to
initialize a disconnecting, the 100 receiving workitems are never being
released, callback functions are never being called. If we continue to
destroy QP, the final result is IB stack can't do its clean up work
because it still holds some extra reference counter. Message similar to
the following line shows up in debug version:

	 

	[AL]print_al_obj() !ERROR!: AL object
fffffadf379c8280(AL_OBJ_TYPE_H_AL),

	 

	 

	It seems the AL handle we open can't be destroyed. But I doubt
maybe we already are in a bad state before that.

	 

	Winddbg stack, this is on x64 Win2003 server:

	        fffffadf`2664e880 fffff800`01027682
nt!KiSwapContext+0x85

	        fffffadf`2664ea00 fffff800`0102828e
nt!KiSwapThread+0x3c9

	        fffffadf`2664ea60 fffffadf`25ac7a3d
nt!KeWaitForSingleObject+0x5a6

	        fffffadf`2664eae0 fffffadf`25b5fca8
ibbus!cl_event_wait_on+0x11d
[c:\windows-openib\src\winib-1176g\core\complib\kernel\cl_event.c @ 59]

	        fffffadf`2664eb40 fffffadf`25b0013b
ibbus!sync_destroy_obj+0x228
[c:\windows-openib\src\winib-1176g\core\al\al_common.c @ 513]

	        fffffadf`2664ebb0 fffffadf`25a1f8c7
ibbus!ib_close_al+0x3bb [c:\windows-openib\src\winib-1176g\core\al\al.c
@ 89]

	        fffffadf`2664ec10 fffffadf`25a1b23f
MyDriver!IBAccessLayer::Close+0x77 

	 

	The al handle ref_cnt is 1 here.

	 

	Can anyone shed some light on this? Is this a known issue which
is fixed in WinOF2.0 or is it an unknown problem?

	 

	Thanks,

	James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081015/1a870814/attachment.html>


More information about the ofw mailing list