[ofw] Disconnection problem and AL reference
Tzachi Dar
tzachid at mellanox.co.il
Wed Oct 15 10:01:11 PDT 2008
Can you please send us the program that you have been using?
Thanks
Tzachi
________________________________
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
Sent: Wednesday, October 15, 2008 4:05 AM
To: James Yang; ofw at lists.openfabrics.org
Subject: RE: [ofw] Disconnection problem and AL reference
Also there is an error message from windbg:
[MLX4_HCA] mlnx_query_ca() :***ERROR*** ib_query_device failed
(-16)
What's does this mean? I think the ca handle we opened never got
destroyed after we close it.
Thanks,
James
________________________________
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
Sent: Tuesday, October 14, 2008 3:02 PM
To: ofw at lists.openfabrics.org
Subject: [ofw] Disconnection problem and AL reference
Hi,
Our driver product is based on WinOF1.1. Recently I saw a
problem that Windows cannot shut down. The procedure and observation are
as follows:
Install the driver, when there is still some traffic going on,
reboot the system.
We do the following in our driver, and everything seems working
until reboot.
* create_cq() : one receive queue and one send queue,
and set the callback function
* create_qp() with the above created queues, and set
init state IB_QPS_INIT
* cm_req() with the QP and correct connection path
* post_recv() with 100 package buffer for receiving data
* post_send() when necessary
Receive and send are fine with the respective callback invoked,
whenever there is data activity.
At certain point during shutdown, when we try to do cm_dreq() to
initialize a disconnecting, the 100 receiving workitems are never being
released, callback functions are never being called. If we continue to
destroy QP, the final result is IB stack can't do its clean up work
because it still holds some extra reference counter. Message similar to
the following line shows up in debug version:
[AL]print_al_obj() !ERROR!: AL object
fffffadf379c8280(AL_OBJ_TYPE_H_AL),
It seems the AL handle we open can't be destroyed. But I doubt
maybe we already are in a bad state before that.
Winddbg stack, this is on x64 Win2003 server:
fffffadf`2664e880 fffff800`01027682
nt!KiSwapContext+0x85
fffffadf`2664ea00 fffff800`0102828e
nt!KiSwapThread+0x3c9
fffffadf`2664ea60 fffffadf`25ac7a3d
nt!KeWaitForSingleObject+0x5a6
fffffadf`2664eae0 fffffadf`25b5fca8
ibbus!cl_event_wait_on+0x11d
[c:\windows-openib\src\winib-1176g\core\complib\kernel\cl_event.c @ 59]
fffffadf`2664eb40 fffffadf`25b0013b
ibbus!sync_destroy_obj+0x228
[c:\windows-openib\src\winib-1176g\core\al\al_common.c @ 513]
fffffadf`2664ebb0 fffffadf`25a1f8c7
ibbus!ib_close_al+0x3bb [c:\windows-openib\src\winib-1176g\core\al\al.c
@ 89]
fffffadf`2664ec10 fffffadf`25a1b23f
MyDriver!IBAccessLayer::Close+0x77
The al handle ref_cnt is 1 here.
Can anyone shed some light on this? Is this a known issue which
is fixed in WinOF2.0 or is it an unknown problem?
Thanks,
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081015/1a870814/attachment.html>
More information about the ofw
mailing list