[ofw] RE: Disconnection problem and AL reference

James Yang jyang at xsigo.com
Wed Oct 15 15:02:11 PDT 2008


Hi Fab,

QP ref is 1 before calling cm_dreq(). 

----here is the call---
disconnectRequest.h_qp = qpHandle;
disconnectRequest.flags = 0;
disconnectRequest.p_dreq_pdata = NULL;
disconnectRequest.dreq_length = 0;
disconnectRequest.qp_type = IB_QPT_RELIABLE_CONN;
disconnectRequest.pfn_cm_drep_cb = IBConnectionDisconnectCallback;

cm_dreq(&disconnectRequest);
----------------------------

During dreq() call, in al_cep_dreq() the p_cep->state is CEP_STATE_TIMEWAIT. It caused the QP is not touched. Maybe that's reason callback never got called.

But why we got a CEP_STATE_TIMEWAIT state? Some previous commands cause the problem? Here is the structure, any data suspicous?

ibbus_fffffadf24334000!_al_kcep
   +0x000 cid              : 1
   +0x008 context          : 0xfffffadf`3682aa10 
   +0x010 p_cid            : 0xfffffadf`37d69030 _cep_cid
   +0x018 sid              : 0x1971302`00000000
   +0x020 port_guid        : 0
   +0x028 p_cmp_buf        : (null) 
   +0x030 cmp_offset       : 0 ''
   +0x031 cmp_len          : 0 ''
   +0x034 p2p              : 0
   +0x038 al_item          : _cl_list_item
   +0x050 signalled        : 1
   +0x058 pfn_destroy_cb   : (null) 
   +0x060 p_mad_head       : (null) 
   +0x068 p_mad_tail       : 0xfffffadf`364ec558 _ib_mad_element
   +0x070 pfn_cb           : 0xfffffadf`243ad050     void  ibbus_fffffadf24334000!__cm_handler+0
   +0x078 p_irp            : (null) 
   +0x080 listen_item      : _cl_rbmap_item
   +0x0a8 rem_id_item      : _cl_rbmap_item
   +0x0d0 rem_qp_item      : _cl_rbmap_item
   +0x0f8 local_comm_id    : 0x2000001
   +0x0fc remote_comm_id   : 0x831d3e8b
   +0x100 local_ca_guid    : 0xdc8c0200`03c90200
   +0x108 remote_ca_guid   : 0xe0000001`02971300
   +0x110 remote_qpn       : 0xa04b700
   +0x114 sq_psn           : 0xa04b700
   +0x118 rq_psn           : 0x48000000
   +0x11c resp_res         : 0x4 ''
   +0x11d init_depth       : 0x4 ''
   +0x11e rnr_nak_timeout  : 0x8 ''
   +0x120 local_qpn        : 0x48000000
   +0x124 pkey             : 0xffff
   +0x126 req_init_depth   : 0 ''
   +0x128 av               : [2] _al_kcep_av
   +0x1b8 idx_primary      : 0 ''
   +0x1c0 alt_av           : _al_kcep_av
   +0x208 alt_2pkt_life    : 0 ''
   +0x209 max_2pkt_life    : 0x13 ''
   +0x20a target_ack_delay : 0x14 ''
   +0x20b local_ack_delay  : 0xf ''
   +0x20c state            : 3 ( CEP_STATE_TIMEWAIT )
   +0x210 was_active       : 1
   +0x218 h_mad_svc        : 0xfffffadf`36845730 _al_mad_svc
   +0x220 p_send_mad       : (null) 
   +0x228 ref_cnt          : 1
   +0x230 tid              : 0x8b3e1d83`06000000
   +0x238 max_cm_retries   : 0x3 ''
   +0x23c retry_timeout    : 0x1920
   +0x240 timewait_timer   : _KTIMER
   +0x280 timewait_time    : _LARGE_INTEGER 0xffffffff`fc28f600
   +0x288 timewait_item    : _cl_list_item
   +0x2a0 p_mad            : (null) 
   +0x2a8 mads             : _mads
   +0x3a8 irp_que          : _LIST_ENTRY [ 0xfffffadf`3682a928 - 0xfffffadf`3682a928 ]
   +0x3b8 psize            : 0 ''
   +0x3b9 pdata            : [196]  ""

Thanks,
James


-----Original Message-----
From: Fab Tillier [mailto:ftillier at windows.microsoft.com] 
Sent: Wednesday, October 15, 2008 12:41 PM
To: James Yang; ofw at lists.openfabrics.org
Subject: RE: Disconnection problem and AL reference

Hi James,

> After calling cm_dreq(), my callback for it is not called and the
> workitems are also never get called. After cm_dreq() time out, I call
> destroy_qp() with status successful. But the reference count of QP is
> always 1.

When cm_dreq times out, you should get a DREP notification, and the QP should be in the error state.

Check the reference count on the QP before you call cm_dreq.  If you can, also check the reference count on the CEP for your QP.  When the cm_dreq times out (why is it timing out, did the other side not reply?) again check the CEP reference count.  The timeout is processed in the __cep_mad_send_cb function in al_cm_cep.c.  Then walk the code to make sure the DREP callback is invoked, or if it isn't, why.  The CEP takes a reference on the QP, make sure that gets released when the QP is destroyed (QP destruction should destroy the CEP in the destroying callback for the QP object)

-Fab




More information about the ofw mailing list