[ofw] RE: Patch: [ipoib] Limit the time that is being spent in a DPC(don't handle more than 128 packet at one recv call)

Fab Tillier ftillier at windows.microsoft.com
Sun Nov 9 12:06:22 PST 2008


Hi Tzachi,

I don't see any code to make sure the DPC isn't queued during cleanup.  You could get a situation where the regular polling loop polled the last item and recv_cnt exceeds its limit - a DPC will get queued, but there could be no receives posted to the QP anymore.

The cleanup routine should make sure the DPC isn't queued or running (maybe take a reference on the port object when the DPC is queued, that you release after it runs?

-Fab

> -----Original Message-----
> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Sunday, November 09, 2008 11:27 AM
> To: Fab Tillier; Sean Hefty; ofw at lists.openfabrics.org
> Subject: RE: Patch: [ipoib] Limit the time that is being spent in a
> DPC(don't handle more than 128 packet at one recv call)
>
> Actually, it doesn't work even with Mellanox HCAs.
>
> Attached is a fix that does work (I hope). This patch creates a DPC
> that is used to complete the pooling if needed.
>
> Thanks
> Tzachi
>
> Index: ulp/ipoib/kernel/ipoib_port.c
> =================================================================== ---
> ulp/ipoib/kernel/ipoib_port.c       (revision 1737) +++
> ulp/ipoib/kernel/ipoib_port.c       (working copy) @@ -70,6 +70,11 @@
>
>  static void __port_mcast_garbage_dpc(KDPC *p_gc_dpc,void
> *context,void  *s_arg1, void *s_arg2); static void
>  __port_do_mcast_garbage(ipoib_port_t* const        p_port
> );
> +
> +
> +static void __recv_cb_dpc(KDPC *p_gc_dpc,void *context,void *s_arg1,
> void *s_arg2);
> +
> +
>
> /*********************************************************************
> *
> * *******
>  *
>  * Declarations
> @@ -676,6 +681,10 @@
>                         p_adapter->p_ifc->get_err_str( status )) );
>                 return status;
>         }
> +
> +
> KeInitializeDpc(&p_port-
>> recv_dpc,(PKDEFERRED_ROUTINE)__recv_cb_dpc,p_po
> rt);
> +
> +
>          /* Initialize multicast garbage collector timer and DPC
> object */
>
> KeInitializeDpc(&p_port-
>> gc_dpc,(PKDEFERRED_ROUTINE)__port_mcast_garbage
> _dpc,p_port);
>          KeInitializeTimerEx(&p_port->gc_timer,SynchronizationTimer);
> @@ -1602,7 +1611,23 @@
>         IPOIB_EXIT( IPOIB_DBG_RECV );
>  }
>  +static void __recv_cb_dpc(KDPC *p_gc_dpc,void *context,void * s_arg1
> +, void * s_arg2) +{
>
> +       ipoib_port_t *p_port = context;
> +
> +       UNREFERENCED_PARAMETER(p_gc_dpc);
> +       UNREFERENCED_PARAMETER(s_arg1);
> +       UNREFERENCED_PARAMETER(s_arg2);
> +
> +
> +       __recv_cb(NULL, p_port);
> +       ipoib_port_deref( p_port, ref_recv_cb );
> +
> +
> +}
> +
> +
>  static void
>  __recv_cb(
>         IN              const   ib_cq_handle_t
> h_cq,
> @@ -1666,7 +1691,7 @@
>                 recv_cnt += __recv_mgr_filter( p_port, p_wc, &done_list,
>                 &bad_list ); cl_perf_stop( &p_port->p_adapter->perf,
>                 FilterRecv );
>
> -       } while( !p_free );
> +       } while( (!p_free) && (recv_cnt < 128));
>
>         /* We're done looking at the endpoint map, release the
>         reference. */ cl_atomic_dec( &p_port->endpt_rdr ); @@ -1745,18
>         +1770,23 @@ } while( pkt_cnt ); cl_spinlock_release(
>         &p_port->recv_lock );
>  -       /* -        * Rearm after filtering to prevent contention on
> the enpoint maps -        * and eliminate the possibility of having a
> call to -        * __endpt_mgr_insert find a duplicate. -        */ -
>    cl_perf_start( RearmRecv ); -       status =
> p_port->p_adapter->p_ifc->rearm_cq( -
> p_port->ib_mgr.h_recv_cq, FALSE ); -       cl_perf_stop(
> &p_port->p_adapter->perf, RearmRecv ); -       CL_ASSERT( status ==
> IB_SUCCESS ); +       if (p_free ) { +               /* +
> * Rearm after filtering to prevent contention on the enpoint maps +
>           * and eliminate the possibility of having a call to +
>       * __endpt_mgr_insert find a duplicate. +                */ +
>         cl_perf_start( RearmRecv ); +               status =
> p_port->p_adapter->p_ifc->rearm_cq( +
> p_port->ib_mgr.h_recv_cq, FALSE ); +               cl_perf_stop(
> &p_port->p_adapter->perf, RearmRecv ); +               CL_ASSERT( status
> == IB_SUCCESS );
>
> -       ipoib_port_deref( p_port, ref_recv_cb );
> +               ipoib_port_deref( p_port, ref_recv_cb );
> +       } else {
> +               // Please note the reference is still up
> +               KeInsertQueueDpc(&p_port->recv_dpc, NULL, NULL);
> +       }
>
>         cl_perf_stop( &p_port->p_adapter->perf, RecvCb );
>  Index: ulp/ipoib/kernel/ipoib_port.h
> =================================================================== ---
> ulp/ipoib/kernel/ipoib_port.h       (revision 1724) +++
> ulp/ipoib/kernel/ipoib_port.h       (working copy) @@ -502,6 +502,8 @@
>         ipoib_recv_mgr_t                recv_mgr;
>         ipoib_send_mgr_t                send_mgr;
>
> +       KDPC                                    recv_dpc;
> +
>         ipoib_endpt_mgr_t               endpt_mgr;
>
>         ipoib_endpt_t                   *p_local_endpt;
>
>> -----Original Message-----
>> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
>> Sent: Friday, November 07, 2008 12:56 AM
>> To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
>> Subject: RE: Patch: [ipoib] Limit the time that is being spent in a
>> DPC(don't handle more than 128 packet at one recv call)
>>
>>> At the end of the recv_cb function we arm the cq again.
>>>
>>> After the arm a new EQE will be created and the function will be
>>> called again.
>>
>> That will only work reliably with Mellanox HCAs, though - from an IB
>> spec perspective the DPC should be requeue'd, and then poll until the
>> CQ is empty, and only then rearm.
>>
>> Personally I like the Mellanox semantics better (issue the callback
>> if there are any WCs left in the CQ) - it makes for simpler code.  An
>> HCA driver can enforce these semantics anyway if the hardware doesn't
>> support it.
>>
>> So while not IB compliant, I think this is fine.
>>
>> -Fab
>>



More information about the ofw mailing list