[ofw] RE: Is there a race in __endpt_mgr_add_local?
Alex Estrin
alex.estrin at qlogic.com
Sun Oct 26 20:06:52 PDT 2008
Hi Tzachi,
Here are my thoughts on the points you have described.
(1)(2) ipoib_port_deref() call is expected only as a completion call for async calls when ipoib_port_ref called first to prevent port obj go away (due to miniport reset, unload etc..) until async call completes.
If you observed ..port_deref() that destroyed port obj, than most likely port_deref() doesn't have matched port_ref().
Could it be introduced by "local mad" patch( please see (4) ? I don't have code available right now, so can't be sure .
(4)__endpt_mgr_add_local() was moved to __bcast_cb() as a result of implementing patch "local mad". Patch eliminated port info SA query along with it's completion call _port_info_cb() where ...add_local() used to be.
In bcast_cb() port obj lock is released before __endpt_mgr_add_local() called. I believe lock should be released after the call.
(5)(6) agree.
(3) is not relevant to the problem.Probaly can be done separately?
Thanks,
Alex.
________________________________
From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
Sent: Sun 10/26/2008 1:01 PM
To: Alex Estrin; Fab Tillier; ofw at lists.openfabrics.org
Subject: RE: [ofw] RE: Is there a race in __endpt_mgr_add_local?
Actually things are somewhat more complicated, than what I thought.
There are 3 lists in our code. This are guarded by two locks
The reader lock and the object_lock.
Object_lock is being used to syncronize write to these lists.
So, after some more code reviewing, it seems that I have found the
following errors:
1) on ipoib_port_deref we have been using the object after it was
already released. (debug only)
2) On many places we have been using cl_obj_destroy( &p_port->obj );
Instead of ipoib_port_deref( p_port, ref_port_destroying);
To my understanding this is an error as we might be deleting something
that someone else is using (referance is not zero).
3) The function ipoib_port_flush_endpts was never used. (I want to
remove it).
4) on the function __bcast_cb call take the lock before calling
__endpt_mgr_add_loca.
5) In the function __mcast_cb we have been inserting to the list without
the readers lock.
6) In the function port_do_mcast_garbage the lists are being changed
wihtout the readers lock.
Attached is the first draft of the patche in order to have community
comments:
I'm still running test to see if things are improved.
And last thing stan, I don't think that it is related to your bug. (I'm
also working on looking at this bug as well).
Thanks
Tzachi
Index: ipoib_debug.h
===================================================================
--- ipoib_debug.h (revision 3369)
+++ ipoib_debug.h (working copy)
@@ -292,7 +292,8 @@
ref_mcast_av_failed = 400 + ref_join_mcast,
ref_mcast_join_failed = 500 + ref_join_mcast,
- ref_port_info_cb = 100 + ref_port_up
+ ref_port_info_cb = 100 + ref_port_up,
+ ref_port_destroying = 200 + ref_port_up
};
Index: ipoib_endpoint.h
===================================================================
--- ipoib_endpoint.h (revision 3369)
+++ ipoib_endpoint.h (working copy)
@@ -60,7 +60,7 @@
net16_t dlid;
net32_t qpn;
ib_av_handle_t h_av;
- boolean_t expired;
+// boolean_t expired; // remove
???????
ib_al_ifc_t *p_ifc;
boolean_t is_in_use;
boolean_t is_mcast_listener;
@@ -140,7 +140,7 @@
* or trying to send data to that endpoint. Clear the expired
flag
* to prevent the AV from being flushed.
*/
- p_endpt->expired = FALSE;
+// p_endpt->expired = FALSE;
}
Index: ipoib_port.c
===================================================================
--- ipoib_port.c (revision 3369)
+++ ipoib_port.c (working copy)
@@ -495,8 +495,6 @@
inline void ipoib_port_deref(ipoib_port_t * p_port, int type)
{
- cl_obj_deref( &p_port->obj );
-
#if DBG
cl_atomic_dec( &p_port->ref[type % ref_mask] );
IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_OBJ,
@@ -504,6 +502,8 @@
#else
UNREFERENCED_PARAMETER(type);
#endif
+ cl_obj_deref( &p_port->obj );
+
}
/* function returns pointer to payload that is going after IP header.
@@ -575,7 +575,7 @@
CL_ASSERT( p_port->p_adapter );
CL_ASSERT( !p_port->p_adapter->p_port );
- cl_obj_destroy( &p_port->obj );
+ ipoib_port_deref( p_port, ref_port_destroying);
IPOIB_EXIT( IPOIB_DBG_INIT );
}
@@ -3954,15 +3954,15 @@
p_desc->wr.ds_array = p_desc->local_ds;
- p_desc->wr.dgrm.ud.remote_qp = p_desc->p_endpt->qpn;
+ p_desc->wr.dgrm.ud.remote_qp = p_desc->p_endpt1->qpn;
p_desc->wr.dgrm.ud.remote_qkey = p_port->ib_mgr.bcast_rec.qkey;
- p_desc->wr.dgrm.ud.h_av = p_desc->p_endpt->h_av;
+ p_desc->wr.dgrm.ud.h_av = p_desc->p_endpt1->h_av;
p_desc->wr.dgrm.ud.pkey_index = p_port->pkey_index;
p_desc->wr.dgrm.ud.rsvd = NULL;
/* Store context in our reserved area of the packet. */
IPOIB_PORT_FROM_PACKET( p_desc->p_pkt ) = p_port;
- IPOIB_ENDPT_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_endpt;
+ IPOIB_ENDPT_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_endpt1;
IPOIB_SEND_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_buf;
IPOIB_EXIT( IPOIB_DBG_SEND );
@@ -3983,8 +3983,8 @@
p_desc->p_pkt, status );
ipoib_inc_send_stat( p_port->p_adapter, IP_STAT_ERROR, 0 );
/* Deref the endpoint. */
- if( p_desc->p_endpt )
- ipoib_endpt_deref( p_desc->p_endpt );
+ if( p_desc->p_endpt1 )
+ ipoib_endpt_deref( p_desc->p_endpt1 );
if( p_desc->p_buf )
{
@@ -4044,7 +4044,7 @@
for( i = 0; i < num_packets; i++ )
{
desc.p_pkt = p_packet_array[i];
- desc.p_endpt = NULL;
+ desc.p_endpt1 = NULL;
desc.p_buf = NULL;
/* Get the ethernet header so we can find the endpoint.
*/
@@ -4107,7 +4107,7 @@
p_eth_hdr->dst.addr[3] = ((unsigned
char*)&p_ip_hdr->dst_ip)[1];
}
h_end:
- status = __send_mgr_queue( p_port, p_eth_hdr,
&desc.p_endpt );
+ status = __send_mgr_queue( p_port, p_eth_hdr,
&desc.p_endpt1 );
cl_perf_stop( &p_port->p_adapter->perf, SendMgrQueue );
if( status == NDIS_STATUS_PENDING )
{
@@ -4216,7 +4216,7 @@
desc.p_pkt = IPOIB_PACKET_FROM_LIST_ITEM(
cl_qlist_remove_head(
&p_port->send_mgr.pending_list ) );
- desc.p_endpt = NULL;
+ desc.p_endpt1 = NULL;
desc.p_buf = NULL;
/* Get the ethernet header so we can find the endpoint.
*/
@@ -4231,10 +4231,11 @@
}
cl_perf_start( GetEndpt );
- status = __endpt_mgr_ref( p_port, p_eth_hdr->dst,
&desc.p_endpt );
+ status = __endpt_mgr_ref( p_port, p_eth_hdr->dst,
&desc.p_endpt1 );
cl_perf_stop( &p_port->p_adapter->perf, GetEndpt );
if( status == NDIS_STATUS_PENDING )
{
+ CL_ASSERT(desc.p_endpt == NULL);
cl_qlist_insert_head(
&p_port->send_mgr.pending_list,
IPOIB_LIST_ITEM_FROM_PACKET( desc.p_pkt
) );
break;
@@ -4242,6 +4243,7 @@
else if( status != NDIS_STATUS_SUCCESS )
{
ASSERT( status ==
NDIS_STATUS_NO_ROUTE_TO_DESTINATION );
+ CL_ASSERT(desc.p_endpt == NULL);
if( ETH_IS_MULTICAST( p_eth_hdr->dst.addr ) )
{
@@ -4618,8 +4620,11 @@
/* Destroy all multicast endpoints now that we have released the
lock. */
while( cl_qlist_count( &mc_list ) )
{
- cl_obj_destroy( &PARENT_STRUCT( cl_qlist_remove_head(
&mc_list ),
- ipoib_endpt_t, mac_item.pool_item.list_item
)->obj );
+ cl_list_item_t *p_item;
+ p_item = cl_qlist_remove_head( &mc_list );
+ p_endpt = PARENT_STRUCT(p_item, ipoib_endpt_t,
mac_item.pool_item.list_item);
+ ipoib_endpt_deref( p_endpt );
+
}
IPOIB_EXIT( IPOIB_DBG_ENDPT );
@@ -4639,8 +4644,11 @@
{
IPOIB_ENTER( IPOIB_DBG_ENDPT );
+ /* This function must be called from the recieve path */
+ CL_ASSERT(p_port->endpt_rdr > 0);
+
cl_obj_lock( &p_port->obj );
- /* Wait for all readers to complete. */
+ /* Wait for all readers to complete. */
while( p_port->endpt_rdr > 1 )
;
@@ -4660,7 +4668,7 @@
cl_obj_unlock( &p_port->obj );
- cl_obj_destroy( &p_endpt->obj );
+ ipoib_endpt_deref( p_endpt );
IPOIB_EXIT( IPOIB_DBG_ENDPT );
}
@@ -4974,7 +4982,7 @@
if( cl_status != CL_SUCCESS )
{
- cl_obj_destroy( &p_endpt->obj );
+ ipoib_endpt_deref(p_endpt);
return IB_INVALID_STATE;
}
@@ -5094,7 +5102,7 @@
}
cl_obj_unlock( &p_port->obj );
- cl_obj_destroy( &p_endpt->obj );
+ ipoib_endpt_deref( p_endpt );
#if DBG
cl_atomic_dec( &p_port->ref[ref_endpt_track] );
IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_ENDPT,
@@ -5110,7 +5118,7 @@
IPOIB_EXIT( IPOIB_DBG_ENDPT );
}
-
+#if 0
void
ipoib_port_flush_endpts(
IN ipoib_port_t* const
p_port )
@@ -5153,8 +5161,8 @@
IPOIB_EXIT( IPOIB_DBG_ENDPT );
}
+#endif
-
/*
* The sequence for port up is as follows:
* 1. The port goes active. This allows the adapter to send SA
queries
@@ -5741,9 +5749,16 @@
IPOIB_EXIT( IPOIB_DBG_INIT );
return;
}
- cl_obj_unlock( &p_port->obj );
p_port->bc_join_retry_cnt = 0;
+ while( p_port->endpt_rdr )
+ {
+ cl_obj_unlock( &p_port->obj );
+ cl_obj_lock( &p_port->obj );
+ }
+ ;
+// cl_obj_unlock( &p_port->obj );
+
tmp = cl_atomic_inc(&p_port->m_Counter);
if (tmp !=1) {
CL_ASSERT(FALSE);
@@ -5757,7 +5772,7 @@
cl_memclr(&port_info, sizeof(port_info));
port_info.base_lid = p_port->base_lid;
status = __endpt_mgr_add_local( p_port, &port_info );
-
+ cl_obj_unlock( &p_port->obj );
if( status != IB_SUCCESS )
{
IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_ERROR,
@@ -5768,6 +5783,10 @@
}
}
cl_atomic_dec(&p_port->m_Counter);
+
+ cl_obj_unlock( &p_port->obj );
+
+
status = __endpt_mgr_add_bcast( p_port, p_mcast_rec );
if( status != IB_SUCCESS )
{
@@ -6044,10 +6063,10 @@
("Invalid state - Aborting.\n") );
return;
}
- cl_obj_unlock( &p_port->obj );
if( p_mcast_rec->status != IB_SUCCESS )
{
+ cl_obj_unlock( &p_port->obj );
IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_ERROR,
("Multicast join request failed with status
%s.\n",
p_port->p_adapter->p_ifc->get_err_str(
p_mcast_rec->status )) );
@@ -6058,7 +6077,6 @@
return;
}
- cl_obj_lock( &p_port->obj );
p_item = cl_fmap_get(
&p_port->endpt_mgr.gid_endpts,
&p_mcast_rec->p_member_rec->mgid );
if( p_item == cl_fmap_end( &p_port->endpt_mgr.gid_endpts ) )
@@ -6104,6 +6122,9 @@
*/
if( p_endpt->dlid )
{
+ while( p_port->endpt_rdr )
+ ;
+
p_qitem = cl_qmap_insert(
&p_port->endpt_mgr.lid_endpts, p_endpt->dlid,
&p_endpt->lid_item );
CL_ASSERT( p_qitem == &p_endpt->lid_item );
@@ -6333,6 +6354,9 @@
cl_qlist_init( &destroy_mc_list );
cl_obj_lock( &p_port->obj );
+ /* Wait for all readers to finish */
+ while( p_port->endpt_rdr )
+ ;
cnt = 0;
p_item = cl_qmap_head( &p_port->endpt_mgr.mac_endpts );
while( (p_item != cl_qmap_end( &p_port->endpt_mgr.mac_endpts ))
&& (cnt < GC_MAX_LEAVE_NUM))
@@ -6381,9 +6405,7 @@
p_endpt->mac.addr[3],
p_endpt->mac.addr[4],
p_endpt->mac.addr[5]) );
-
- cl_obj_destroy( &PARENT_STRUCT( cl_qlist_remove_head(
&destroy_mc_list ),
- ipoib_endpt_t, mac_item.pool_item.list_item
)->obj );
+ ipoib_endpt_deref( p_endpt );
}
}
Index: ipoib_port.h
===================================================================
--- ipoib_port.h (revision 3369)
+++ ipoib_port.h (working copy)
@@ -378,7 +378,7 @@
typedef struct _ipoib_send_desc
{
NDIS_PACKET *p_pkt;
- ipoib_endpt_t *p_endpt;
+ ipoib_endpt_t *p_endpt1;
send_buf_t *p_buf;
ib_send_wr_t wr;
ipoib_hdr_t pkt_hdr;
@@ -596,10 +596,14 @@
IN ipoib_port_t* const
p_port,
IN const mac_addr_t
mac );
+
+#if 0
void
ipoib_port_flush_endpts(
IN ipoib_port_t* const
p_port );
+#endif
+
void
ipoib_port_send(
IN ipoib_port_t* const
p_port,
> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> Sent: Friday, October 24, 2008 4:22 PM
> To: Fab Tillier; Tzachi Dar; ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: Is there a race in __endpt_mgr_add_local?
>
> Yes, it should be guarded with port lock.
>
> Thanks,
> Alex.
>
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org
> > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Thursday, October 23, 2008 6:57 PM
> > To: Tzachi Dar; ofw at lists.openfabrics.org
> > Subject: [ofw] RE: Is there a race in __endpt_mgr_add_local?
> >
> > It probably needs a lock. I don't have access to my dev
> box right now
> > so can't look for sure.
> >
> > -Fab
> >
> > >From: ofw-bounces at lists.openfabrics.org
> > >[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Tzachi Dar
> > >Sent: Thursday, October 23, 2008 3:51 PM
> > >To: Fab Tillier; ofw at lists.openfabrics.org
> > >Subject: [ofw] Is there a race in __endpt_mgr_add_local?
> > >
> > >Hi Fab,
> > >
> > >The function __endpt_mgr_add_local is being called from
> > __bcast_cb without the port lock being hold.
> > >
> > >It later calls __endpt_mgr_insert (again without the lock).
> > >
> > >I believe that this is a bug, is there anyone who disagrees?
> > >
> > >Thanks
> > >Tzachi
> > >
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081026/e52bd4b5/attachment.html>
More information about the ofw
mailing list