[ofw] RE: Patch: [ipoib] Limit the time that is being spent in aDPC(don't handle more than 128 packet at one recv call)

Fab Tillier ftillier at windows.microsoft.com
Sun Nov 9 18:22:05 PST 2008


Ah, yes, that makes sense.  Does it work? :)

> -----Original Message-----
> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Sunday, November 09, 2008 1:52 PM
> To: Fab Tillier; Sean Hefty; ofw at lists.openfabrics.org
> Subject: RE: Patch: [ipoib] Limit the time that is being spent in
> aDPC(don't handle more than 128 packet at one recv call)
>
> Hi Fab,
>
> I was trying to do what you suggested:
>
> The function recv_cb takes a reference when it is called (even before
> I have touched it).
>
> At the end of this function, I need to decide if to rearm or to enter
> a new DPC.
> In the case that I arm, I remove the reference.
>
> In the case that I run the DPC, I don't remove the DPC. It is only
> being removed in the function __recv_cb_dpc after I have called the
> function __recv_cb.
>
> Does this make sense?
>
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
>> Sent: Sunday, November 09, 2008 10:06 PM
>> To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
>> Subject: RE: Patch: [ipoib] Limit the time that is being spent in
>> aDPC(don't handle more than 128 packet at one recv call)
>>
>> Hi Tzachi,
>>
>> I don't see any code to make sure the DPC isn't queued during cleanup.
>> You could get a situation where the regular polling loop polled the
>> last item and recv_cnt exceeds its limit - a DPC will get queued, but
>> there could be no receives posted to the QP anymore.
>>
>> The cleanup routine should make sure the DPC isn't queued or running
>> (maybe take a reference on the port object when the DPC is queued,
>> that you release after it runs?
>>
>> -Fab
>>
>>> -----Original Message-----
>>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>>> Sent: Sunday, November 09, 2008 11:27 AM
>>> To: Fab Tillier; Sean Hefty; ofw at lists.openfabrics.org
>>> Subject: RE: Patch: [ipoib] Limit the time that is being spent in
>>> a DPC(don't handle more than 128 packet at one recv call)
>>>
>>> Actually, it doesn't work even with Mellanox HCAs.
>>>
>>> Attached is a fix that does work (I hope). This patch creates a
>>> DPC that is used to complete the pooling if needed.
>>>
>>> Thanks
>>> Tzachi
>>>
>>> Index: ulp/ipoib/kernel/ipoib_port.c
>>>
>> ==============================================================
>> ===== ---
>>> ulp/ipoib/kernel/ipoib_port.c       (revision 1737) +++
>>> ulp/ipoib/kernel/ipoib_port.c       (working copy) @@ -70,6 +70,11 @@
>>>
>>>  static void __port_mcast_garbage_dpc(KDPC *p_gc_dpc,void
>>>  *context,void  *s_arg1, void *s_arg2); static void
>>>  __port_do_mcast_garbage(ipoib_port_t* const        p_port
>>> ); + + +static void __recv_cb_dpc(KDPC *p_gc_dpc,void *context,void
>>> *s_arg1, void *s_arg2); + +
>>>
>>>
>>
> /*********************************************************************
>>> *
>>> * *******
>>>  *
>>>  * Declarations
>>> @@ -676,6 +681,10 @@
>>>                         p_adapter->p_ifc->get_err_str( status )) );
>>>                 return status;
>>>         }
>>> +
>>> +
>>> KeInitializeDpc(&p_port-
>>>> recv_dpc,(PKDEFERRED_ROUTINE)__recv_cb_dpc,p_po
>>> rt);
>>> +
>>> +
>>>          /* Initialize multicast garbage collector timer and DPC
>>> object */
>>>
>>> KeInitializeDpc(&p_port-
>>>> gc_dpc,(PKDEFERRED_ROUTINE)__port_mcast_garbage
>>> _dpc,p_port);
>>>
>>> KeInitializeTimerEx(&p_port->gc_timer,SynchronizationTimer); @@
>>> -1602,7 +1611,23 @@
>>>         IPOIB_EXIT( IPOIB_DBG_RECV );  }  +static void
>>> __recv_cb_dpc(KDPC *p_gc_dpc,void *context,void * s_arg1 +, void *
>>> s_arg2) +{
>>>
>>> +       ipoib_port_t *p_port = context;
>>> +
>>> +       UNREFERENCED_PARAMETER(p_gc_dpc);
>>> +       UNREFERENCED_PARAMETER(s_arg1);
>>> +       UNREFERENCED_PARAMETER(s_arg2);
>>> +
>>> +
>>> +       __recv_cb(NULL, p_port);
>>> +       ipoib_port_deref( p_port, ref_recv_cb );
>>> +
>>> +
>>> +}
>>> +
>>> +
>>>  static void
>>>  __recv_cb(
>>>         IN              const   ib_cq_handle_t
>>> h_cq,
>>> @@ -1666,7 +1691,7 @@
>>>                 recv_cnt += __recv_mgr_filter( p_port, p_wc,
>>>                 &done_list, &bad_list ); cl_perf_stop(
>>>                 &p_port->p_adapter- perf, FilterRecv );
>>>
>>> -       } while( !p_free );
>>> +       } while( (!p_free) && (recv_cnt < 128));
>>>
>>>         /* We're done looking at the endpoint map, release the
>>>         reference. */ cl_atomic_dec( &p_port->endpt_rdr ); @@ -1745,18
>>>         +1770,23 @@ } while( pkt_cnt ); cl_spinlock_release(
>>>         &p_port->recv_lock );
>>>  -       /* -        * Rearm after filtering to prevent
>> contention on
>>> the enpoint maps -        * and eliminate the possibility of having a
>>> call to -        * __endpt_mgr_insert find a duplicate. -
>>      */ -
>>>    cl_perf_start( RearmRecv ); -       status =
>>> p_port->p_adapter->p_ifc->rearm_cq( -
>>> p_port->ib_mgr.h_recv_cq, FALSE ); -       cl_perf_stop(
>>> &p_port->p_adapter->perf, RearmRecv ); -       CL_ASSERT( status ==
>>> IB_SUCCESS ); +       if (p_free ) { +               /* +
>>> * Rearm after filtering to prevent contention on the enpoint maps +
>>>           * and eliminate the possibility of having a call to +
>>>       * __endpt_mgr_insert find a duplicate. +                */ +
>>>         cl_perf_start( RearmRecv ); +               status =
>>> p_port->p_adapter->p_ifc->rearm_cq( + p_port->ib_mgr.h_recv_cq, FALSE
>>> ); +               cl_perf_stop( &p_port->p_adapter->perf, RearmRecv
>>> ); + CL_ASSERT( status == IB_SUCCESS );
>>>
>>> -       ipoib_port_deref( p_port, ref_recv_cb );
>>> +               ipoib_port_deref( p_port, ref_recv_cb );
>>> +       } else {
>>> +               // Please note the reference is still up
>>> +               KeInsertQueueDpc(&p_port->recv_dpc, NULL, NULL);
>>> +       }
>>>
>>>         cl_perf_stop( &p_port->p_adapter->perf, RecvCb );
>>>  Index: ulp/ipoib/kernel/ipoib_port.h
>>>
>> ==============================================================
>> ===== ---
>>> ulp/ipoib/kernel/ipoib_port.h       (revision 1724) +++
>>> ulp/ipoib/kernel/ipoib_port.h       (working copy) @@
>> -502,6 +502,8 @@
>>>         ipoib_recv_mgr_t                recv_mgr;
>>>         ipoib_send_mgr_t                send_mgr;
>>>
>>> +       KDPC                                    recv_dpc;
>>> +
>>>         ipoib_endpt_mgr_t               endpt_mgr;
>>>
>>>         ipoib_endpt_t                   *p_local_endpt;
>>>
>>>> -----Original Message----- From: Fab Tillier
>>>> [mailto:ftillier at windows.microsoft.com] Sent: Friday, November 07,
>>>> 2008 12:56 AM To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
>>>> Subject: RE: Patch: [ipoib] Limit the time that is being spent in a
>>>> DPC(don't handle more than 128 packet at one recv call)
>>>>
>>>>> At the end of the recv_cb function we arm the cq again.
>>>>>
>>>>> After the arm a new EQE will be created and the function will be
>>>>> called again.
>>>>  That will only work reliably with Mellanox HCAs, though - from an IB
>>>> spec perspective the DPC should be requeue'd, and then poll until the
>>>> CQ is empty, and only then rearm.
>>>>
>>>> Personally I like the Mellanox semantics better (issue the callback
>>>> if there are any WCs left in the CQ) - it makes for simpler code.  An
>>>> HCA driver can enforce these semantics anyway if the hardware doesn't
>>>> support it.
>>>>
>>>> So while not IB compliant, I think this is fine.
>>>>
>>>> -Fab
>>>>
>>



More information about the ofw mailing list