[Openib-windows] Wrong allocation of mads in al_mad_pool.c (user mode) line 800

Tzachi Dar tzachid at mellanox.co.il
Wed Apr 5 08:28:54 PDT 2006


OK,

So here is another question:
When running SDP code with many connections simultaneously, I some times
get an assert with the following code stack:

ChildEBP RetAddr  Args to Child              
f78ae758 8086b5d0 80889f00 00000003 f7717000 nt!DbgBreakPoint
f78aea48 8086b6f8 baaaf050 baaaf020 0000021c nt!RtlAssert2+0x104
f78aea64 baaaf234 baaaf050 baaaf020 0000021c nt!RtlAssert+0x18
f78aea88 baaaecb1 8a834970 89c23a30 89ed5e18 ibal!__reject_mad+0x164
[q:\projinf1\trunk\core\al\kernel\al_cm_cep.c @ 540]
f78aead4 baaa93e5 8a834970 89ed5e18 8a6b72a8 ibal!__process_rep+0x531
[q:\projinf1\trunk\core\al\kernel\al_cm_cep.c @ 1348]
f78aeb00 baa78465 8a81a168 ffffffff 8a834970
ibal!__cep_mad_recv_cb+0x1e5
[q:\projinf1\trunk\core\al\kernel\al_cm_cep.c @ 1885]
f78aeb34 baa6dccc 8a81a168 ffffffff 89ed5e18
ibal!__mad_svc_recv_done+0xa55 [q:\projinf1\trunk\core\al\al_mad.c @
2206]
f78aeb94 bab273be 8a81b870 89ed5e18 f77179c0
ibal!mad_disp_recv_done+0x12ac [q:\projinf1\trunk\core\al\al_mad.c @
1004]
f78aebc0 bab26c8d 8a81c1c8 89ed5e18 00000001 ibal!process_mad_recv+0x31e
[q:\projinf1\trunk\core\al\kernel\al_smi.c @ 2284]
f78aec50 bab26632 8a81c1c8 8a8316b0 ffffffff ibal!spl_qp_comp+0x29d
[q:\projinf1\trunk\core\al\kernel\al_smi.c @ 2125]
f78aec78 baaa467b 8a8316b0 ffffffff 8a81c1c8
ibal!spl_qp_recv_comp_cb+0x112
[q:\projinf1\trunk\core\al\kernel\al_smi.c @ 1995]
f78aec94 bad69170 8a8316b0 f78aeca4 00000000 ibal!ci_ca_comp_cb+0x6b
[q:\projinf1\trunk\core\al\kernel\al_ci_ca.c @ 323]
f78aecb8 bad8aff4 8a81f6d8 8a860e38 85000000 mthca!cq_comp_handler+0xc0
[q:\projinf1\trunk\hw\mthca\kernel\hca_data.c @ 326]
f78aecd0 bad8d8a1 8a6b72a8 00000085 8a6b74a8
mthca!mthca_cq_completion+0xa4
[q:\projinf1\trunk\hw\mthca\kernel\mthca_cq.c @ 234]
f78aed04 bad8d5b6 8a6b72a8 8a6b77d8 8a6b72a8 mthca!mthca_eq_int+0x81
[q:\projinf1\trunk\hw\mthca\kernel\mthca_eq.c @ 328]
f78aed28 80805d22 8a6b7840 8a6b77d8 00000000 mthca!mthca_tavor_dpc+0x36
[q:\projinf1\trunk\hw\mthca\kernel\mthca_eq.c @ 455]
f78aed50 80805c07 00000000 0000000e 00000000 nt!KiRetireDpcList+0x61
f78aed54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x28

In my case the function is called with p_cep->state ==
CEP_STATE_REQ_SENT 
and the reason is IB_REJ_STALE_CONN.

Actually, while looking at the two functions __process_rep, __reject_mad

it seems that every time that the insert in __process_rep will fail in
the insert
(that is if( __insert_cep( p_cep ) != p_cep ))  we will reach an assert.

Can you tell what the problem here is?

Thanks
Tzachi

Some more information that might help:

1: kd> dt p_cep
Local var @ 0xf78aea94 Type _al_kcep*
0x89c23a30 
   +0x000 cid              : 5
   +0x004 context          : 0x8a741a18 
   +0x008 p_cid            : 0x8a45f05c _cep_cid
   +0x010 sid              : 0x22c80100`00000000
   +0x018 port_guid        : 0
   +0x020 p_cmp_buf        : (null) 
   +0x028 cmp_offset       : 0 ''
   +0x029 cmp_len          : 0 ''
   +0x02c p2p              : 0
   +0x030 al_item          : _cl_list_item
   +0x03c signalled        : 0
   +0x040 pfn_destroy_cb   : (null) 
   +0x048 p_mad_head       : (null) 
   +0x04c p_mad_tail       : (null) 
   +0x050 pfn_cb           : 0xbaa49af0     ibal!__cm_handler+0
   +0x054 p_irp            : (null) 
   +0x058 listen_item      : _cl_rbmap_item
   +0x06c rem_id_item      : _cl_rbmap_item
   +0x080 rem_qp_item      : _cl_rbmap_item
   +0x094 local_comm_id    : 0x6000005
   +0x098 remote_comm_id   : 0x2000021
   +0x0a0 local_ca_guid    : 0xa0392a01`00c90000
   +0x0a8 remote_ca_guid   : 0x20e92a01`00c90200
   +0x0b0 remote_qpn       : 0x18044200
   +0x0b4 sq_psn           : 0x18044200
   +0x0b8 rq_psn           : 0x19046b00
   +0x0bc resp_res         : 0x4 ''
   +0x0bd init_depth       : 0x4 ''
   +0x0be rnr_nak_timeout  : 0x6 ''
   +0x0c0 local_qpn        : 0x19046b00
   +0x0c4 pkey             : 0xffff
   +0x0c6 req_init_depth   : 0 ''
   +0x0c8 av               : [2] _al_kcep_av
   +0x158 idx_primary      : 0 ''
   +0x160 alt_av           : _al_kcep_av
   +0x1a8 alt_2pkt_life    : 0 ''
   +0x1a9 max_2pkt_life    : 0x13 ''
   +0x1aa target_ack_delay : 0xf ''
   +0x1ab local_ack_delay  : 0xf ''
   +0x1ac state            : 20000001 ( CEP_STATE_REQ_SENT )
   +0x1b0 was_active       : 1
   +0x1b8 h_mad_svc        : 0x8a81a168 _al_mad_svc
   +0x1c0 p_send_mad       : 0x8a6b6030 _ib_mad_element
   +0x1c4 ref_cnt          : 2
   +0x1c8 tid              : 0x5000005
   +0x1d0 max_cm_retries   : 0x4 ''
   +0x1d4 retry_timeout    : 0x1920
   +0x1d8 timewait_timer   : _KTIMER
   +0x200 timewait_time    : _LARGE_INTEGER 0x0
   +0x208 timewait_item    : _cl_list_item
   +0x214 p_mad            : (null) 
   +0x218 mads             : _mads

1: kd> dt -r1 ibal!gp_cep_mgr
0x8a6b2398 
   +0x000 obj              : _al_obj
      +0x000 pool_item        : _cl_pool_item
      +0x014 p_parent_obj     : 0x8a82a958 _al_obj
      +0x018 p_ci_ca          : (null) 
      +0x01c context          : (null) 
      +0x020 async_item       : _cl_async_proc_item
      +0x038 event            : _KEVENT
      +0x048 timeout_ms       : 0x2710
      +0x04c desc_cnt         : 0
      +0x050 pfn_destroy      : 0xbaa5aed0
ibal!sync_destroy_obj+0
      +0x054 pfn_destroying   : 0xbaab7950
ibal!__destroying_cep_mgr+0
      +0x058 pfn_cleanup      : (null) 
      +0x05c pfn_free         : 0xbaab7f20        ibal!__free_cep_mgr+0
      +0x060 user_destroy_cb  : (null) 
      +0x068 lock             : _cl_spinlock
      +0x070 obj_list         : _cl_qlist
      +0x084 ref_cnt          : 58
      +0x088 list_item        : _cl_list_item
      +0x094 type             : 0x16
      +0x098 state            : 2 ( CL_INITIALIZED )
      +0x0a0 hdl              : 0
      +0x0a8 h_al             : (null) 
      +0x0b0 hdl_valid        : 0
   +0x0b8 port_map         : _cl_qmap
      +0x000 root             : _cl_map_item
      +0x038 nil              : _cl_map_item
      +0x070 state            : 2 ( CL_INITIALIZED )
      +0x074 count            : 2
   +0x130 lock             : 0xf78aeabc
   +0x134 cid_vector       : _cl_vector
      +0x000 size             : 0xff
      +0x004 grow_size        : 0xff
      +0x008 capacity         : 0xff
      +0x00c element_size     : 0x10
      +0x010 pfn_init         : 0xbaab8200        ibal!__cid_init+0
      +0x014 pfn_dtor         : (null) 
      +0x018 pfn_copy         : 0xbab2b600
ibal!cl_vector_copy_general+0
      +0x01c context          : (null) 
      +0x020 alloc_list       : _cl_qlist
      +0x034 p_ptr_array      : 0x8a8193c8  -> 0x8a45f00c 
      +0x038 state            : 2 ( CL_INITIALIZED )
   +0x170 free_cid         : 0x1a
   +0x174 listen_map       : _cl_rbmap
      +0x000 root             : _cl_rbmap_item
      +0x014 nil              : _cl_rbmap_item
      +0x028 state            : 2 ( CL_INITIALIZED )
      +0x02c count            : 0
   +0x1a4 conn_id_map      : _cl_rbmap
      +0x000 root             : _cl_rbmap_item
      +0x014 nil              : _cl_rbmap_item
      +0x028 state            : 2 ( CL_INITIALIZED )
      +0x02c count            : 0x18
   +0x1d4 conn_qp_map      : _cl_rbmap
      +0x000 root             : _cl_rbmap_item
      +0x014 nil              : _cl_rbmap_item
      +0x028 state            : 2 ( CL_INITIALIZED )
      +0x02c count            : 0x19
   +0x208 cep_pool         : _NPAGED_LOOKASIDE_LIST
      +0x000 L                : _GENERAL_LOOKASIDE
      +0x048 Lock__ObsoleteButDoNotDelete : 0
   +0x258 req_pool         : _NPAGED_LOOKASIDE_LIST
      +0x000 L                : _GENERAL_LOOKASIDE
      +0x048 Lock__ObsoleteButDoNotDelete : 0
   +0x2a8 timewait_timer   : _cl_timer
      +0x000 timer            : _KTIMER
      +0x028 dpc              : _KDPC
      +0x048 pfn_callback     : 0xbaab6f80
ibal!__cep_timewait_cb+0
      +0x04c context          : (null) 
      +0x050 timeout_time     : 0x5d63a4f8
   +0x300 timewait_list    : _cl_qlist
      +0x000 end              : _cl_list_item
      +0x00c count            : 2
      +0x010 state            : 2 ( CL_INITIALIZED )
   +0x318 h_pnp            : 0x8a70f490 _al_pnp
      +0x000 obj              : _al_obj
      +0x0b8 async_item       : _cl_async_proc_item
      +0x0d0 p_sync_event     : 0xf78e2874 _KEVENT
      +0x0d4 list_item        : _cl_list_item
      +0x0e0 dereg_item       : _cl_async_proc_item
      +0x0f8 pnp_class        : 2
      +0x100 context_map      : _cl_qmap
      +0x178 p_rearm_irp      : (null) 
      +0x17c p_dereg_irp      : (null) 
      +0x180 pfn_pnp_cb       : 0xbaaa6ad0        ibal!__cep_pnp_cb+0
 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Wednesday, April 05, 2006 12:37 AM
> To: Tzachi Dar
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] Wrong allocation of mads in 
> al_mad_pool.c (user mode) line 800
> 
> On 4/4/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> > OK, I understand my mistake now.
> >
> > There wasn't a particular problem that I was trying to solve. When 
> > looking for the error in handling mads that were received 
> with GRH, I 
> > saw this code, and I thought there was an error here.
> >
> > Thanks again
> 
> No worries, Tzachi.  Please ask all you want - it will mean 
> more of us are intricately familiar with the internals of IBAL.
> 
> - Fab
> 
> 
> 



More information about the ofw mailing list