[ofw] RE: Patch: [ipoib] Limit the time that is being spent in aDPC(don't handle more than 128 packet at one recv call)

Tzachi Dar tzachid at mellanox.co.il
Mon Nov 10 08:56:50 PST 2008


Yes, as far as we have been able to test.

Applied on 1743/4.

Thanks
Tzachi 

> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at windows.microsoft.com] 
> Sent: Monday, November 10, 2008 4:22 AM
> To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
> Subject: RE: Patch: [ipoib] Limit the time that is being 
> spent in aDPC(don't handle more than 128 packet at one recv call)
> 
> Ah, yes, that makes sense.  Does it work? :)
> 
> > -----Original Message-----
> > From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> > Sent: Sunday, November 09, 2008 1:52 PM
> > To: Fab Tillier; Sean Hefty; ofw at lists.openfabrics.org
> > Subject: RE: Patch: [ipoib] Limit the time that is being spent in 
> > aDPC(don't handle more than 128 packet at one recv call)
> >
> > Hi Fab,
> >
> > I was trying to do what you suggested:
> >
> > The function recv_cb takes a reference when it is called 
> (even before 
> > I have touched it).
> >
> > At the end of this function, I need to decide if to rearm 
> or to enter 
> > a new DPC.
> > In the case that I arm, I remove the reference.
> >
> > In the case that I run the DPC, I don't remove the DPC. It is only 
> > being removed in the function __recv_cb_dpc after I have called the 
> > function __recv_cb.
> >
> > Does this make sense?
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
> >> Sent: Sunday, November 09, 2008 10:06 PM
> >> To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
> >> Subject: RE: Patch: [ipoib] Limit the time that is being spent in 
> >> aDPC(don't handle more than 128 packet at one recv call)
> >>
> >> Hi Tzachi,
> >>
> >> I don't see any code to make sure the DPC isn't queued 
> during cleanup.
> >> You could get a situation where the regular polling loop 
> polled the 
> >> last item and recv_cnt exceeds its limit - a DPC will get 
> queued, but 
> >> there could be no receives posted to the QP anymore.
> >>
> >> The cleanup routine should make sure the DPC isn't queued 
> or running 
> >> (maybe take a reference on the port object when the DPC is queued, 
> >> that you release after it runs?
> >>
> >> -Fab
> >>
> >>> -----Original Message-----
> >>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> >>> Sent: Sunday, November 09, 2008 11:27 AM
> >>> To: Fab Tillier; Sean Hefty; ofw at lists.openfabrics.org
> >>> Subject: RE: Patch: [ipoib] Limit the time that is being 
> spent in a 
> >>> DPC(don't handle more than 128 packet at one recv call)
> >>>
> >>> Actually, it doesn't work even with Mellanox HCAs.
> >>>
> >>> Attached is a fix that does work (I hope). This patch 
> creates a DPC 
> >>> that is used to complete the pooling if needed.
> >>>
> >>> Thanks
> >>> Tzachi
> >>>
> >>> Index: ulp/ipoib/kernel/ipoib_port.c
> >>>
> >> ==============================================================
> >> ===== ---
> >>> ulp/ipoib/kernel/ipoib_port.c       (revision 1737) +++
> >>> ulp/ipoib/kernel/ipoib_port.c       (working copy) @@ 
> -70,6 +70,11 @@
> >>>
> >>>  static void __port_mcast_garbage_dpc(KDPC *p_gc_dpc,void  
> >>> *context,void  *s_arg1, void *s_arg2); static void
> >>>  __port_do_mcast_garbage(ipoib_port_t* const        p_port
> >>> ); + + +static void __recv_cb_dpc(KDPC *p_gc_dpc,void 
> *context,void 
> >>> *s_arg1, void *s_arg2); + +
> >>>
> >>>
> >>
> > 
> /*********************************************************************
> >>> *
> >>> * *******
> >>>  *
> >>>  * Declarations
> >>> @@ -676,6 +681,10 @@
> >>>                         p_adapter->p_ifc->get_err_str( 
> status )) );
> >>>                 return status;
> >>>         }
> >>> +
> >>> +
> >>> KeInitializeDpc(&p_port-
> >>>> recv_dpc,(PKDEFERRED_ROUTINE)__recv_cb_dpc,p_po
> >>> rt);
> >>> +
> >>> +
> >>>          /* Initialize multicast garbage collector timer and DPC 
> >>> object */
> >>>
> >>> KeInitializeDpc(&p_port-
> >>>> gc_dpc,(PKDEFERRED_ROUTINE)__port_mcast_garbage
> >>> _dpc,p_port);
> >>>
> >>> KeInitializeTimerEx(&p_port->gc_timer,SynchronizationTimer); @@
> >>> -1602,7 +1611,23 @@
> >>>         IPOIB_EXIT( IPOIB_DBG_RECV );  }  +static void 
> >>> __recv_cb_dpc(KDPC *p_gc_dpc,void *context,void * s_arg1 +, void *
> >>> s_arg2) +{
> >>>
> >>> +       ipoib_port_t *p_port = context;
> >>> +
> >>> +       UNREFERENCED_PARAMETER(p_gc_dpc);
> >>> +       UNREFERENCED_PARAMETER(s_arg1);
> >>> +       UNREFERENCED_PARAMETER(s_arg2);
> >>> +
> >>> +
> >>> +       __recv_cb(NULL, p_port);
> >>> +       ipoib_port_deref( p_port, ref_recv_cb );
> >>> +
> >>> +
> >>> +}
> >>> +
> >>> +
> >>>  static void
> >>>  __recv_cb(
> >>>         IN              const   ib_cq_handle_t
> >>> h_cq,
> >>> @@ -1666,7 +1691,7 @@
> >>>                 recv_cnt += __recv_mgr_filter( p_port, p_wc,
> >>>                 &done_list, &bad_list ); cl_perf_stop(
> >>>                 &p_port->p_adapter- perf, FilterRecv );
> >>>
> >>> -       } while( !p_free );
> >>> +       } while( (!p_free) && (recv_cnt < 128));
> >>>
> >>>         /* We're done looking at the endpoint map, release the
> >>>         reference. */ cl_atomic_dec( &p_port->endpt_rdr 
> ); @@ -1745,18
> >>>         +1770,23 @@ } while( pkt_cnt ); cl_spinlock_release(
> >>>         &p_port->recv_lock );
> >>>  -       /* -        * Rearm after filtering to prevent
> >> contention on
> >>> the enpoint maps -        * and eliminate the possibility 
> of having a
> >>> call to -        * __endpt_mgr_insert find a duplicate. -
> >>      */ -
> >>>    cl_perf_start( RearmRecv ); -       status =
> >>> p_port->p_adapter->p_ifc->rearm_cq( -
> >>> p_port->ib_mgr.h_recv_cq, FALSE ); -       cl_perf_stop(
> >>> &p_port->p_adapter->perf, RearmRecv ); -       CL_ASSERT( 
> status ==
> >>> IB_SUCCESS ); +       if (p_free ) { +               /* +
> >>> * Rearm after filtering to prevent contention on the 
> enpoint maps +
> >>>           * and eliminate the possibility of having a call to +
> >>>       * __endpt_mgr_insert find a duplicate. +                */ +
> >>>         cl_perf_start( RearmRecv ); +               status =
> >>> p_port->p_adapter->p_ifc->rearm_cq( + 
> p_port->ib_mgr.h_recv_cq, FALSE
> >>> ); +               cl_perf_stop( 
> &p_port->p_adapter->perf, RearmRecv
> >>> ); + CL_ASSERT( status == IB_SUCCESS );
> >>>
> >>> -       ipoib_port_deref( p_port, ref_recv_cb );
> >>> +               ipoib_port_deref( p_port, ref_recv_cb );
> >>> +       } else {
> >>> +               // Please note the reference is still up
> >>> +               KeInsertQueueDpc(&p_port->recv_dpc, NULL, NULL);
> >>> +       }
> >>>
> >>>         cl_perf_stop( &p_port->p_adapter->perf, RecvCb );
> >>>  Index: ulp/ipoib/kernel/ipoib_port.h
> >>>
> >> ==============================================================
> >> ===== ---
> >>> ulp/ipoib/kernel/ipoib_port.h       (revision 1724) +++
> >>> ulp/ipoib/kernel/ipoib_port.h       (working copy) @@
> >> -502,6 +502,8 @@
> >>>         ipoib_recv_mgr_t                recv_mgr;
> >>>         ipoib_send_mgr_t                send_mgr;
> >>>
> >>> +       KDPC                                    recv_dpc;
> >>> +
> >>>         ipoib_endpt_mgr_t               endpt_mgr;
> >>>
> >>>         ipoib_endpt_t                   *p_local_endpt;
> >>>
> >>>> -----Original Message----- From: Fab Tillier 
> >>>> [mailto:ftillier at windows.microsoft.com] Sent: Friday, 
> November 07,
> >>>> 2008 12:56 AM To: Tzachi Dar; Sean Hefty; 
> ofw at lists.openfabrics.org
> >>>> Subject: RE: Patch: [ipoib] Limit the time that is being 
> spent in a 
> >>>> DPC(don't handle more than 128 packet at one recv call)
> >>>>
> >>>>> At the end of the recv_cb function we arm the cq again.
> >>>>>
> >>>>> After the arm a new EQE will be created and the 
> function will be 
> >>>>> called again.
> >>>>  That will only work reliably with Mellanox HCAs, though 
> - from an 
> >>>> IB spec perspective the DPC should be requeue'd, and then poll 
> >>>> until the CQ is empty, and only then rearm.
> >>>>
> >>>> Personally I like the Mellanox semantics better (issue 
> the callback 
> >>>> if there are any WCs left in the CQ) - it makes for 
> simpler code.  
> >>>> An HCA driver can enforce these semantics anyway if the hardware 
> >>>> doesn't support it.
> >>>>
> >>>> So while not IB compliant, I think this is fine.
> >>>>
> >>>> -Fab
> >>>>
> >>
> 



More information about the ofw mailing list