[ofw] Disconnection problem and AL reference

James Yang jyang at xsigo.com
Tue Oct 14 19:04:46 PDT 2008


Also there is an error message from windbg:

 

[MLX4_HCA] mlnx_query_ca() :***ERROR***  ib_query_device failed (-16)

 

What's does this mean? I think the ca handle we opened never got
destroyed after we close it.

 

Thanks,

James

 

________________________________

From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
Sent: Tuesday, October 14, 2008 3:02 PM
To: ofw at lists.openfabrics.org
Subject: [ofw] Disconnection problem and AL reference

 

Hi,

 

Our driver product is based on WinOF1.1. Recently I saw a problem that
Windows cannot shut down. The procedure and observation are as follows:

 

Install the driver, when there is still some traffic going on, reboot
the system.

 

We do the following in our driver, and everything seems working until
reboot.

 

*         create_cq() :   one receive queue and one send queue, and set
the callback function

 

*         create_qp() with the above created queues, and set init state
IB_QPS_INIT

 

*         cm_req() with the QP and correct connection path

 

*         post_recv() with 100 package buffer for receiving data

 

*         post_send() when necessary

 

 

Receive and send are fine with the respective callback invoked, whenever
there is data activity.

 

At certain point during shutdown, when we try to do cm_dreq() to
initialize a disconnecting, the 100 receiving workitems are never being
released, callback functions are never being called. If we continue to
destroy QP, the final result is IB stack can't do its clean up work
because it still holds some extra reference counter. Message similar to
the following line shows up in debug version:

 

[AL]print_al_obj() !ERROR!: AL object
fffffadf379c8280(AL_OBJ_TYPE_H_AL),

 

 

It seems the AL handle we open can't be destroyed. But I doubt maybe we
already are in a bad state before that.

 

Winddbg stack, this is on x64 Win2003 server:

        fffffadf`2664e880 fffff800`01027682 nt!KiSwapContext+0x85

        fffffadf`2664ea00 fffff800`0102828e nt!KiSwapThread+0x3c9

        fffffadf`2664ea60 fffffadf`25ac7a3d
nt!KeWaitForSingleObject+0x5a6

        fffffadf`2664eae0 fffffadf`25b5fca8 ibbus!cl_event_wait_on+0x11d
[c:\windows-openib\src\winib-1176g\core\complib\kernel\cl_event.c @ 59]

        fffffadf`2664eb40 fffffadf`25b0013b ibbus!sync_destroy_obj+0x228
[c:\windows-openib\src\winib-1176g\core\al\al_common.c @ 513]

        fffffadf`2664ebb0 fffffadf`25a1f8c7 ibbus!ib_close_al+0x3bb
[c:\windows-openib\src\winib-1176g\core\al\al.c @ 89]

        fffffadf`2664ec10 fffffadf`25a1b23f
MyDriver!IBAccessLayer::Close+0x77 

 

The al handle ref_cnt is 1 here.

 

Can anyone shed some light on this? Is this a known issue which is fixed
in WinOF2.0 or is it an unknown problem?

 

Thanks,

James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081014/021da83d/attachment.html>


More information about the ofw mailing list