[ofw] RE: Disconnection problem and AL reference
Ishai Rabinovitz
ishai at mellanox.co.il
Sun Oct 19 04:31:57 PDT 2008
James:
Can you please open a bug about it in Bugzilla
(https://bugs.openfabrics.org/) - this way we will have all the data in
one place.
Do you see the problem also with the latest RC of WinOF 2.0?
Thanks
Ishai
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
> Sent: Thursday, October 16, 2008 12:02 AM
> To: Fab Tillier; ofw at lists.openfabrics.org
> Subject: [ofw] RE: Disconnection problem and AL reference
>
>
> Hi Fab,
>
> QP ref is 1 before calling cm_dreq().
>
> ----here is the call---
> disconnectRequest.h_qp = qpHandle;
> disconnectRequest.flags = 0;
> disconnectRequest.p_dreq_pdata = NULL;
> disconnectRequest.dreq_length = 0;
> disconnectRequest.qp_type = IB_QPT_RELIABLE_CONN;
> disconnectRequest.pfn_cm_drep_cb = IBConnectionDisconnectCallback;
>
> cm_dreq(&disconnectRequest);
> ----------------------------
>
> During dreq() call, in al_cep_dreq() the p_cep->state is
> CEP_STATE_TIMEWAIT. It caused the QP is not touched. Maybe
> that's reason callback never got called.
>
> But why we got a CEP_STATE_TIMEWAIT state? Some previous
> commands cause the problem? Here is the structure, any data suspicous?
>
> ibbus_fffffadf24334000!_al_kcep
> +0x000 cid : 1
> +0x008 context : 0xfffffadf`3682aa10
> +0x010 p_cid : 0xfffffadf`37d69030 _cep_cid
> +0x018 sid : 0x1971302`00000000
> +0x020 port_guid : 0
> +0x028 p_cmp_buf : (null)
> +0x030 cmp_offset : 0 ''
> +0x031 cmp_len : 0 ''
> +0x034 p2p : 0
> +0x038 al_item : _cl_list_item
> +0x050 signalled : 1
> +0x058 pfn_destroy_cb : (null)
> +0x060 p_mad_head : (null)
> +0x068 p_mad_tail : 0xfffffadf`364ec558 _ib_mad_element
> +0x070 pfn_cb : 0xfffffadf`243ad050 void
> ibbus_fffffadf24334000!__cm_handler+0
> +0x078 p_irp : (null)
> +0x080 listen_item : _cl_rbmap_item
> +0x0a8 rem_id_item : _cl_rbmap_item
> +0x0d0 rem_qp_item : _cl_rbmap_item
> +0x0f8 local_comm_id : 0x2000001
> +0x0fc remote_comm_id : 0x831d3e8b
> +0x100 local_ca_guid : 0xdc8c0200`03c90200
> +0x108 remote_ca_guid : 0xe0000001`02971300
> +0x110 remote_qpn : 0xa04b700
> +0x114 sq_psn : 0xa04b700
> +0x118 rq_psn : 0x48000000
> +0x11c resp_res : 0x4 ''
> +0x11d init_depth : 0x4 ''
> +0x11e rnr_nak_timeout : 0x8 ''
> +0x120 local_qpn : 0x48000000
> +0x124 pkey : 0xffff
> +0x126 req_init_depth : 0 ''
> +0x128 av : [2] _al_kcep_av
> +0x1b8 idx_primary : 0 ''
> +0x1c0 alt_av : _al_kcep_av
> +0x208 alt_2pkt_life : 0 ''
> +0x209 max_2pkt_life : 0x13 ''
> +0x20a target_ack_delay : 0x14 ''
> +0x20b local_ack_delay : 0xf ''
> +0x20c state : 3 ( CEP_STATE_TIMEWAIT )
> +0x210 was_active : 1
> +0x218 h_mad_svc : 0xfffffadf`36845730 _al_mad_svc
> +0x220 p_send_mad : (null)
> +0x228 ref_cnt : 1
> +0x230 tid : 0x8b3e1d83`06000000
> +0x238 max_cm_retries : 0x3 ''
> +0x23c retry_timeout : 0x1920
> +0x240 timewait_timer : _KTIMER
> +0x280 timewait_time : _LARGE_INTEGER 0xffffffff`fc28f600
> +0x288 timewait_item : _cl_list_item
> +0x2a0 p_mad : (null)
> +0x2a8 mads : _mads
> +0x3a8 irp_que : _LIST_ENTRY [
> 0xfffffadf`3682a928 - 0xfffffadf`3682a928 ]
> +0x3b8 psize : 0 ''
> +0x3b9 pdata : [196] ""
>
> Thanks,
> James
>
>
> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
> Sent: Wednesday, October 15, 2008 12:41 PM
> To: James Yang; ofw at lists.openfabrics.org
> Subject: RE: Disconnection problem and AL reference
>
> Hi James,
>
> > After calling cm_dreq(), my callback for it is not called and the
> > workitems are also never get called. After cm_dreq() time
> out, I call
> > destroy_qp() with status successful. But the reference
> count of QP is
> > always 1.
>
> When cm_dreq times out, you should get a DREP notification,
> and the QP should be in the error state.
>
> Check the reference count on the QP before you call cm_dreq.
> If you can, also check the reference count on the CEP for
> your QP. When the cm_dreq times out (why is it timing out,
> did the other side not reply?) again check the CEP reference
> count. The timeout is processed in the __cep_mad_send_cb
> function in al_cm_cep.c. Then walk the code to make sure the
> DREP callback is invoked, or if it isn't, why. The CEP takes
> a reference on the QP, make sure that gets released when the
> QP is destroyed (QP destruction should destroy the CEP in the
> destroying callback for the QP object)
>
> -Fab
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
More information about the ofw
mailing list