<HTML dir=ltr><HEAD><TITLE>RE: [ofw] RE: Is there a race in __endpt_mgr_add_local?</TITLE>
<META http-equiv=Content-Type content="text/html; charset=unicode">
<META content="MSHTML 6.00.6001.18148" name=GENERATOR></HEAD>
<BODY>
<DIV id=idOWAReplyText71600 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>Hi Tzachi,</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>Here are my thoughts on the points you have described.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>(1)(2) </FONT><FONT face=Arial color=#000000 size=2>ipoib_port_deref() call is expected only as a completion call for async calls when ipoib_port_ref called first to prevent port obj go away (due to miniport reset, unload etc..) until async call completes.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>If you observed ..port_deref() that  destroyed port obj, than most likely port_deref() doesn't have matched port_ref(). </FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>Could it be introduced by "local mad" patch( please see (4) ? I don't have code available right now, so can't be sure .</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>(4)<FONT face="Times New Roman">__endpt_mgr_add_local() was moved to __bcast_cb() as a result of implementing patch "local mad". Patch eliminated port info SA query along with it's completion call _port_info_cb() where ...add_local() used to be. </FONT></FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2><FONT face="Times New Roman">In bcast_cb() port obj  lock is released before __endpt_mgr_add_local() called. I believe lock should be released after the call.</FONT></FONT></DIV>
<DIV dir=ltr><FONT size=2>(5)(6) agree.</FONT></DIV>
<DIV dir=ltr><FONT size=2>
<DIV dir=ltr><FONT face=Arial size=2>(3) is not relevant to the problem.Probaly can be done separately?</FONT></DIV>
<DIV dir=ltr><FONT face=Arial></FONT> </DIV></FONT></DIV>
<DIV dir=ltr><FONT size=2>Thanks,</FONT></DIV>
<DIV dir=ltr><FONT size=2>Alex.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Tzachi Dar [mailto:tzachid@mellanox.co.il]<BR><B>Sent:</B> Sun 10/26/2008 1:01 PM<BR><B>To:</B> Alex Estrin; Fab Tillier; ofw@lists.openfabrics.org<BR><B>Subject:</B> RE: [ofw] RE: Is there a race in __endpt_mgr_add_local?<BR></FONT><BR></DIV>
<DIV>
<P><FONT size=2>Actually things are somewhat more complicated, than what I thought.<BR><BR>There are 3 lists in our code. This are guarded by two locks<BR>The reader lock and the object_lock.<BR><BR>Object_lock is being used to syncronize write to these lists.<BR><BR>So, after some more code reviewing, it seems that I have found the<BR>following errors:<BR><BR>1) on ipoib_port_deref we have been using the object after it was<BR>already released. (debug only)<BR></FONT><FONT size=2><BR>2) On many places we have been using cl_obj_destroy( &p_port->obj );<BR>Instead of ipoib_port_deref( p_port, ref_port_destroying);<BR>To my understanding this is an error as we might be deleting something<BR>that someone else is using (referance is not zero).<BR><BR>3) The function ipoib_port_flush_endpts was never used. (I want to<BR>remove it).<BR><BR>4) on the function __bcast_cb call take the lock before calling<BR>__endpt_mgr_add_loca.<BR><BR>5) In the function __mcast_cb we have been inserting to the list without<BR>the readers lock.<BR><BR>6) In the function port_do_mcast_garbage the lists are being changed<BR>wihtout the readers lock.<BR><BR>Attached is the first draft of the patche in order to have community<BR>comments:<BR><BR>I'm still running test to see if things are improved.<BR><BR>And last thing stan, I don't think that it is related to your bug. (I'm<BR>also working on looking at this bug as well).<BR><BR>Thanks<BR>Tzachi<BR><BR>Index: ipoib_debug.h<BR>===================================================================<BR>--- ipoib_debug.h       (revision 3369)<BR>+++ ipoib_debug.h       (working copy)<BR>@@ -292,7 +292,8 @@<BR>        ref_mcast_av_failed = 400 + ref_join_mcast,<BR>        ref_mcast_join_failed = 500 + ref_join_mcast,<BR><BR>-       ref_port_info_cb = 100 + ref_port_up<BR>+       ref_port_info_cb = 100 + ref_port_up,<BR>+       ref_port_destroying = 200 + ref_port_up<BR><BR> };<BR><BR>Index: ipoib_endpoint.h<BR>===================================================================<BR>--- ipoib_endpoint.h    (revision 3369)<BR>+++ ipoib_endpoint.h    (working copy)<BR>@@ -60,7 +60,7 @@<BR>        net16_t                                 dlid;<BR>        net32_t                                 qpn;<BR>        ib_av_handle_t                  h_av;<BR>-       boolean_t                               expired;<BR>+//     boolean_t                               expired; // remove<BR>???????<BR>        ib_al_ifc_t                             *p_ifc;<BR>        boolean_t                       is_in_use;<BR>        boolean_t                               is_mcast_listener;<BR>@@ -140,7 +140,7 @@<BR>         * or trying to send data to that endpoint.  Clear the expired<BR>flag<BR>         * to prevent the AV from being flushed.<BR>         */<BR>-       p_endpt->expired = FALSE;<BR>+//     p_endpt->expired = FALSE;<BR> }<BR><BR><BR>Index: ipoib_port.c<BR>===================================================================<BR>--- ipoib_port.c        (revision 3369)<BR>+++ ipoib_port.c        (working copy)<BR>@@ -495,8 +495,6 @@<BR><BR> inline void ipoib_port_deref(ipoib_port_t * p_port, int type)<BR> {<BR>-       cl_obj_deref( &p_port->obj );<BR>-<BR> #if DBG<BR>        cl_atomic_dec( &p_port->ref[type % ref_mask] );<BR>        IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_OBJ,<BR>@@ -504,6 +502,8 @@<BR> #else<BR>        UNREFERENCED_PARAMETER(type);<BR> #endif<BR>+       cl_obj_deref( &p_port->obj );<BR>+<BR> }<BR><BR> /* function returns pointer to payload that is going after IP header.<BR>@@ -575,7 +575,7 @@<BR>        CL_ASSERT( p_port->p_adapter );<BR>        CL_ASSERT( !p_port->p_adapter->p_port );<BR><BR>-       cl_obj_destroy( &p_port->obj );<BR>+       ipoib_port_deref( p_port, ref_port_destroying);<BR><BR>        IPOIB_EXIT( IPOIB_DBG_INIT );<BR> }<BR>@@ -3954,15 +3954,15 @@<BR>       <BR>        p_desc->wr.ds_array = p_desc->local_ds;<BR><BR>-       p_desc->wr.dgrm.ud.remote_qp = p_desc->p_endpt->qpn;<BR>+       p_desc->wr.dgrm.ud.remote_qp = p_desc->p_endpt1->qpn;<BR>        p_desc->wr.dgrm.ud.remote_qkey = p_port->ib_mgr.bcast_rec.qkey;<BR>-       p_desc->wr.dgrm.ud.h_av = p_desc->p_endpt->h_av;<BR>+       p_desc->wr.dgrm.ud.h_av = p_desc->p_endpt1->h_av;<BR>        p_desc->wr.dgrm.ud.pkey_index = p_port->pkey_index;<BR>        p_desc->wr.dgrm.ud.rsvd = NULL;<BR><BR>        /* Store context in our reserved area of the packet. */<BR>        IPOIB_PORT_FROM_PACKET( p_desc->p_pkt ) = p_port;<BR>-       IPOIB_ENDPT_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_endpt;<BR>+       IPOIB_ENDPT_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_endpt1;<BR>        IPOIB_SEND_FROM_PACKET( p_desc->p_pkt ) = p_desc->p_buf;<BR><BR>        IPOIB_EXIT( IPOIB_DBG_SEND );<BR>@@ -3983,8 +3983,8 @@<BR>                p_desc->p_pkt, status );<BR>        ipoib_inc_send_stat( p_port->p_adapter, IP_STAT_ERROR, 0 );<BR>        /* Deref the endpoint. */<BR>-       if( p_desc->p_endpt )<BR>-               ipoib_endpt_deref( p_desc->p_endpt );<BR>+       if( p_desc->p_endpt1 )<BR>+               ipoib_endpt_deref( p_desc->p_endpt1 );<BR><BR>        if( p_desc->p_buf )<BR>        {<BR>@@ -4044,7 +4044,7 @@<BR>        for( i = 0; i < num_packets; i++ )<BR>        {<BR>                desc.p_pkt = p_packet_array[i];<BR>-               desc.p_endpt = NULL;<BR>+               desc.p_endpt1 = NULL;<BR>                desc.p_buf = NULL;<BR><BR>                /* Get the ethernet header so we can find the endpoint.<BR>*/<BR>@@ -4107,7 +4107,7 @@<BR>                        p_eth_hdr->dst.addr[3] = ((unsigned<BR>char*)&p_ip_hdr->dst_ip)[1];<BR>                }<BR> h_end:<BR>-               status = __send_mgr_queue( p_port, p_eth_hdr,<BR>&desc.p_endpt );<BR>+               status = __send_mgr_queue( p_port, p_eth_hdr,<BR>&desc.p_endpt1 );<BR>                cl_perf_stop( &p_port->p_adapter->perf, SendMgrQueue );<BR>                if( status == NDIS_STATUS_PENDING )<BR>                {<BR>@@ -4216,7 +4216,7 @@<BR><BR>                desc.p_pkt = IPOIB_PACKET_FROM_LIST_ITEM(<BR>                        cl_qlist_remove_head(<BR>&p_port->send_mgr.pending_list ) );<BR>-               desc.p_endpt = NULL;<BR>+               desc.p_endpt1 = NULL;<BR>                desc.p_buf = NULL;<BR><BR>                /* Get the ethernet header so we can find the endpoint.<BR>*/<BR>@@ -4231,10 +4231,11 @@<BR>                }<BR><BR>                cl_perf_start( GetEndpt );<BR>-               status = __endpt_mgr_ref( p_port, p_eth_hdr->dst,<BR>&desc.p_endpt );<BR>+               status = __endpt_mgr_ref( p_port, p_eth_hdr->dst,<BR>&desc.p_endpt1 );<BR>                cl_perf_stop( &p_port->p_adapter->perf, GetEndpt );<BR>                if( status == NDIS_STATUS_PENDING )<BR>                {<BR>+                       CL_ASSERT(desc.p_endpt == NULL);<BR>                        cl_qlist_insert_head(<BR>&p_port->send_mgr.pending_list,<BR>                                IPOIB_LIST_ITEM_FROM_PACKET( desc.p_pkt<BR>) );<BR>                        break;<BR>@@ -4242,6 +4243,7 @@<BR>                else if( status != NDIS_STATUS_SUCCESS )<BR>                {<BR>                        ASSERT( status ==<BR>NDIS_STATUS_NO_ROUTE_TO_DESTINATION );<BR>+                       CL_ASSERT(desc.p_endpt == NULL);<BR><BR>                        if( ETH_IS_MULTICAST( p_eth_hdr->dst.addr ) )<BR>                        {<BR>@@ -4618,8 +4620,11 @@<BR>        /* Destroy all multicast endpoints now that we have released the<BR>lock. */<BR>        while( cl_qlist_count( &mc_list ) )<BR>        {<BR>-               cl_obj_destroy( &PARENT_STRUCT( cl_qlist_remove_head(<BR>&mc_list ),<BR>-                       ipoib_endpt_t, mac_item.pool_item.list_item<BR>)->obj );<BR>+        cl_list_item_t *p_item;<BR>+        p_item = cl_qlist_remove_head( &mc_list );<BR>+        p_endpt = PARENT_STRUCT(p_item, ipoib_endpt_t,<BR>mac_item.pool_item.list_item);<BR>+        ipoib_endpt_deref( p_endpt );<BR>+<BR>        }<BR><BR>        IPOIB_EXIT( IPOIB_DBG_ENDPT );<BR>@@ -4639,8 +4644,11 @@<BR> {<BR>        IPOIB_ENTER( IPOIB_DBG_ENDPT );<BR><BR>+       /* This function must be called from the recieve path */<BR>+       CL_ASSERT(p_port->endpt_rdr > 0);<BR>+<BR>        cl_obj_lock( &p_port->obj );<BR>-       /* Wait for all readers to complete. */<BR>+       /* Wait for all readers to complete. */   <BR>        while( p_port->endpt_rdr > 1 )<BR>                ;<BR><BR>@@ -4660,7 +4668,7 @@<BR><BR>        cl_obj_unlock( &p_port->obj );<BR><BR>-       cl_obj_destroy( &p_endpt->obj );<BR>+       ipoib_endpt_deref( p_endpt );<BR><BR>        IPOIB_EXIT( IPOIB_DBG_ENDPT );<BR> }<BR>@@ -4974,7 +4982,7 @@<BR><BR>        if( cl_status != CL_SUCCESS )<BR>        {<BR>-               cl_obj_destroy( &p_endpt->obj );<BR>+               ipoib_endpt_deref(p_endpt);<BR>                return IB_INVALID_STATE;<BR>        }<BR><BR>@@ -5094,7 +5102,7 @@<BR>                }<BR><BR>                cl_obj_unlock( &p_port->obj );<BR>-               cl_obj_destroy( &p_endpt->obj );<BR>+               ipoib_endpt_deref( p_endpt );<BR> #if DBG<BR>                cl_atomic_dec( &p_port->ref[ref_endpt_track] );<BR>                IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_ENDPT,<BR>@@ -5110,7 +5118,7 @@<BR>        IPOIB_EXIT( IPOIB_DBG_ENDPT );<BR> }<BR><BR>-<BR>+#if 0<BR> void<BR> ipoib_port_flush_endpts(<BR>        IN                              ipoib_port_t* const<BR>p_port )<BR>@@ -5153,8 +5161,8 @@<BR><BR>        IPOIB_EXIT( IPOIB_DBG_ENDPT );<BR> }<BR>+#endif<BR><BR>-<BR> /*<BR>  * The sequence for port up is as follows:<BR>  *     1. The port goes active.  This allows the adapter to send SA<BR>queries<BR>@@ -5741,9 +5749,16 @@<BR>                IPOIB_EXIT( IPOIB_DBG_INIT );<BR>                return;<BR>        }<BR>-       cl_obj_unlock( &p_port->obj );<BR>        p_port->bc_join_retry_cnt = 0;<BR><BR>+       while( p_port->endpt_rdr )<BR>+       {<BR>+               cl_obj_unlock( &p_port->obj );<BR>+               cl_obj_lock( &p_port->obj );<BR>+       }<BR>+            ;<BR>+//     cl_obj_unlock( &p_port->obj );<BR>+<BR>        tmp = cl_atomic_inc(&p_port->m_Counter);<BR>        if (tmp !=1) {<BR>                CL_ASSERT(FALSE);<BR>@@ -5757,7 +5772,7 @@<BR>                cl_memclr(&port_info, sizeof(port_info));<BR>                port_info.base_lid = p_port->base_lid;<BR>                status = __endpt_mgr_add_local( p_port, &port_info );<BR>-<BR>+               cl_obj_unlock( &p_port->obj );<BR>                if( status != IB_SUCCESS )<BR>                {<BR>                        IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_ERROR,<BR>@@ -5768,6 +5783,10 @@<BR>                }<BR>        }<BR>     cl_atomic_dec(&p_port->m_Counter);<BR>+<BR>+       cl_obj_unlock( &p_port->obj );<BR>+<BR>+<BR>        status = __endpt_mgr_add_bcast( p_port, p_mcast_rec );<BR>        if( status != IB_SUCCESS )<BR>        {<BR>@@ -6044,10 +6063,10 @@<BR>                        ("Invalid state - Aborting.\n") );<BR>                return;<BR>        }<BR>-       cl_obj_unlock( &p_port->obj );<BR><BR>        if( p_mcast_rec->status != IB_SUCCESS )<BR>        {<BR>+               cl_obj_unlock( &p_port->obj );<BR>                IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_ERROR,<BR>                        ("Multicast join request failed with status<BR>%s.\n",<BR>                        p_port->p_adapter->p_ifc->get_err_str(<BR>p_mcast_rec->status )) );<BR>@@ -6058,7 +6077,6 @@<BR>                return;<BR>        }<BR><BR>-       cl_obj_lock( &p_port->obj );<BR>        p_item = cl_fmap_get(<BR>                &p_port->endpt_mgr.gid_endpts,<BR>&p_mcast_rec->p_member_rec->mgid );<BR>        if( p_item == cl_fmap_end( &p_port->endpt_mgr.gid_endpts ) )<BR>@@ -6104,6 +6122,9 @@<BR>         */<BR>        if( p_endpt->dlid )<BR>        {<BR>+               while( p_port->endpt_rdr )<BR>+                       ;<BR>+   <BR>                p_qitem = cl_qmap_insert(<BR>                        &p_port->endpt_mgr.lid_endpts, p_endpt->dlid,<BR>&p_endpt->lid_item );<BR>                CL_ASSERT( p_qitem == &p_endpt->lid_item );<BR>@@ -6333,6 +6354,9 @@<BR>        cl_qlist_init( &destroy_mc_list );<BR><BR>        cl_obj_lock( &p_port->obj );<BR>+       /* Wait for all readers to finish */<BR>+       while( p_port->endpt_rdr )<BR>+        ;   <BR>        cnt = 0;<BR>        p_item = cl_qmap_head( &p_port->endpt_mgr.mac_endpts );<BR>        while( (p_item != cl_qmap_end( &p_port->endpt_mgr.mac_endpts ))<BR>&& (cnt < GC_MAX_LEAVE_NUM))<BR>@@ -6381,9 +6405,7 @@<BR>                                 p_endpt->mac.addr[3],<BR>                                 p_endpt->mac.addr[4],<BR>                                 p_endpt->mac.addr[5]) );<BR>-<BR>-               cl_obj_destroy( &PARENT_STRUCT( cl_qlist_remove_head(<BR>&destroy_mc_list ),<BR>-                       ipoib_endpt_t, mac_item.pool_item.list_item<BR>)->obj );<BR>+               ipoib_endpt_deref( p_endpt );<BR>        }<BR> }<BR><BR>Index: ipoib_port.h<BR>===================================================================<BR>--- ipoib_port.h        (revision 3369)<BR>+++ ipoib_port.h        (working copy)<BR>@@ -378,7 +378,7 @@<BR> typedef struct _ipoib_send_desc<BR> {<BR>        NDIS_PACKET                     *p_pkt;<BR>-       ipoib_endpt_t           *p_endpt;<BR>+       ipoib_endpt_t           *p_endpt1;<BR>        send_buf_t                      *p_buf;<BR>        ib_send_wr_t            wr;<BR>        ipoib_hdr_t                     pkt_hdr;<BR>@@ -596,10 +596,14 @@<BR>        IN                              ipoib_port_t* const<BR>p_port,<BR>        IN              const   mac_addr_t<BR>mac );<BR><BR>+<BR>+#if 0<BR> void<BR> ipoib_port_flush_endpts(<BR>        IN                              ipoib_port_t* const<BR>p_port );<BR>+#endif<BR><BR>+<BR> void<BR> ipoib_port_send(<BR>        IN                              ipoib_port_t* const<BR>p_port,<BR><BR>> -----Original Message-----<BR>> From: Alex Estrin [<A href="mailto:alex.estrin@qlogic.com">mailto:alex.estrin@qlogic.com</A>]<BR>> Sent: Friday, October 24, 2008 4:22 PM<BR>> To: Fab Tillier; Tzachi Dar; ofw@lists.openfabrics.org<BR>> Subject: RE: [ofw] RE: Is there a race in __endpt_mgr_add_local?<BR>><BR>> Yes, it should be guarded with port lock.<BR>><BR>> Thanks,<BR>> Alex.<BR>><BR>> > -----Original Message-----<BR>> > From: ofw-bounces@lists.openfabrics.org<BR>> > [<A href="mailto:ofw-bounces@lists.openfabrics.org">mailto:ofw-bounces@lists.openfabrics.org</A>] On Behalf Of Fab Tillier<BR>> > Sent: Thursday, October 23, 2008 6:57 PM<BR>> > To: Tzachi Dar; ofw@lists.openfabrics.org<BR>> > Subject: [ofw] RE: Is there a race in __endpt_mgr_add_local?<BR>> ><BR>> > It probably needs a lock.  I don't have access to my dev<BR>> box right now<BR>> > so can't look for sure.<BR>> ><BR>> > -Fab<BR>> ><BR>> > >From: ofw-bounces@lists.openfabrics.org<BR>> > >[<A href="mailto:ofw-bounces@lists.openfabrics.org">mailto:ofw-bounces@lists.openfabrics.org</A>] On Behalf Of Tzachi Dar<BR>> > >Sent: Thursday, October 23, 2008 3:51 PM<BR>> > >To: Fab Tillier; ofw@lists.openfabrics.org<BR>> > >Subject: [ofw] Is there a race in __endpt_mgr_add_local?<BR>> > ><BR>> > >Hi Fab,<BR>> > ><BR>> > >The function __endpt_mgr_add_local is being called from<BR>> > __bcast_cb without the port lock being hold.<BR>> > ><BR>> > >It later calls __endpt_mgr_insert (again without the lock).<BR>> > ><BR>> > >I believe that this is a bug, is there anyone who disagrees?<BR>> > ><BR>> > >Thanks<BR>> > >Tzachi<BR>> > ><BR>> > _______________________________________________<BR>> > ofw mailing list<BR>> > ofw@lists.openfabrics.org<BR>> > <A href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw</A><BR>> ><BR>><BR></P></FONT></DIV></BODY></HTML>