[ofw] Disconnection problem and AL reference

James Yang jyang at xsigo.com
Tue Oct 14 15:02:06 PDT 2008


Hi,

 

Our driver product is based on WinOF1.1. Recently I saw a problem that Windows cannot shut down. The procedure and observation are as follows:

 

Install the driver, when there is still some traffic going on, reboot the system.

 

We do the following in our driver, and everything seems working until reboot.

 

*         create_cq() :   one receive queue and one send queue, and set the callback function

 

*         create_qp() with the above created queues, and set init state IB_QPS_INIT

 

*         cm_req() with the QP and correct connection path

 

*         post_recv() with 100 package buffer for receiving data

 

*         post_send() when necessary

 

 

Receive and send are fine with the respective callback invoked, whenever there is data activity.

 

At certain point during shutdown, when we try to do cm_dreq() to initialize a disconnecting, the 100 receiving workitems are never being released, callback functions are never being called. If we continue to destroy QP, the final result is IB stack can't do its clean up work because it still holds some extra reference counter. Message similar to the following line shows up in debug version:

 

[AL]print_al_obj() !ERROR!: AL object fffffadf379c8280(AL_OBJ_TYPE_H_AL),

 

 

It seems the AL handle we open can't be destroyed. But I doubt maybe we already are in a bad state before that.

 

Winddbg stack, this is on x64 Win2003 server:

        fffffadf`2664e880 fffff800`01027682 nt!KiSwapContext+0x85

        fffffadf`2664ea00 fffff800`0102828e nt!KiSwapThread+0x3c9

        fffffadf`2664ea60 fffffadf`25ac7a3d nt!KeWaitForSingleObject+0x5a6

        fffffadf`2664eae0 fffffadf`25b5fca8 ibbus!cl_event_wait_on+0x11d [c:\windows-openib\src\winib-1176g\core\complib\kernel\cl_event.c @ 59]

        fffffadf`2664eb40 fffffadf`25b0013b ibbus!sync_destroy_obj+0x228 [c:\windows-openib\src\winib-1176g\core\al\al_common.c @ 513]

        fffffadf`2664ebb0 fffffadf`25a1f8c7 ibbus!ib_close_al+0x3bb [c:\windows-openib\src\winib-1176g\core\al\al.c @ 89]

        fffffadf`2664ec10 fffffadf`25a1b23f MyDriver!IBAccessLayer::Close+0x77 

 

The al handle ref_cnt is 1 here.

 

Can anyone shed some light on this? Is this a known issue which is fixed in WinOF2.0 or is it an unknown problem?

 

Thanks,

James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081014/46b72057/attachment.html>


More information about the ofw mailing list