[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations

Wed Aug 25 05:01:24 PDT 2010

Please commit.

Thanks
Tzachi

> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Monday, August 23, 2010 8:55 PM
> To: Tzachi Dar; Hefty, Sean; Alex Naslednikov
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> removing array index calculations
> 
> Tzachi Dar wrote:
> > By the way, by experiments that xalex has done in the past he saw
> that
> > actually the fastest way to initialize this lists was to create the
> > list once on another place and then do a memcopy of the result every
> > time you need the list.
> >
> > That said, I'm not sure that we want to go in that way. The best way
> > to improve performance here is very likely to drop the linked lists
> > and move to an array...
> 
> Yes - an array interface would be the best.
> 
> As a side note, using Sean's loop test pgm (modified to closer resemble
> ipoib wc init code), built in the WDK (x64) environment with no
> additional optimization switches, Sean's results are replicated:
> The wc pointer loop is slightly faster than using array indexes.
> 
> MF:\Test-pgms\Loop-Timing>date /t & bin\amd64\lt.exe Mon 08/23/2010
> loops 10000000 s 95622189567 e 95636499284 f 14318180
> f1 0.99941
> f2 0.78953
> 
> where f1 ==
>         for (i = 0; i < LOOPS; i++) {
>                 struct ibv_wc wc[MAX_SEND_WC];
>                 int i;
> 
>                 for (i = 0; i < MAX_SEND_WC; i++) {
>                         wc[i].next = &wc[i + 1];
>                 }
>                 wc[MAX_SEND_WC - 1].next = NULL;
>         }
> 
> f2 ==
>         for (i = 0; i < LOOPS; i++) {
>                 struct ibv_wc wc[MAX_SEND_WC];
>                 struct ibv_wc *p;
> 
>                 for (p = wc; p < &wc[MAX_SEND_WC - 1]; p++ ) {
>                         p->next = p + 1;
>                 }
>                 p->next = NULL;
>         }
> 
> I beleve the ipoib loop performance patch is valid to commit, although
> not critical.
> 
> stan.
> 
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: Smith, Stan [mailto:stan.smith at intel.com]
> >> Sent: Monday, August 09, 2010 11:23 PM
> >> To: Tzachi Dar
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> >> removing array index calculations
> >>
> >> Tzachi Dar wrote:
> >>> Have you been actually being able to measure a difference?
> >>>
> >>> I believe that your code is better, but I wander if it really has
> an
> >>> affect that we can measure.
> >>
> >> In the following code sequence modified to use pointers (with Sean's
> >> observations)
> >>
> >> for( i = 0; i < MAX_SEND_WC; i++ )
> >>      wc[i].p_next = &wc[i + 1];
> >> wc[MAX_SEND_WC - 1].p_next = NULL;
> >>
> >> for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
> >>      p_free->p_next = p_free + 1;
> >> p_free->p_next = NULL;
> >>
> >> If the MS WDK compiler optimizations are really 'good', it might
> >> optimize the loops to basically the same instruction sequence.  I do
> >> not believe this to be the case.
> >>
> >> The slightly faster execution time is based on the observation that
> >> the total number of instructions executed is reduced by skipping the
> >> array index arithmetic by use of pointers.
> >>
> >> Since these loops live in the Tx & Rx speed paths, every little bit
> >> helps.
> >>
> >>
> >>>
> >>> Thanks
> >>> Tzachi
> >>>
> >>>> -----Original Message-----
> >>>> From: Smith, Stan [mailto:stan.smith at intel.com]
> >>>> Sent: Wednesday, August 04, 2010 8:43 PM
> >>>> To: Tzachi Dar
> >>>> Cc: ofw at lists.openfabrics.org
> >>>> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> >>>> removing array index calculations
> >>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>> While reading IPOIB code I noticed a minor speed enhancement in CQ
> >>>> callback routines. When linking WC (work complete) items into a
> >>>> list, by removing the array index calculations in favor of pointer
> >>>> arithmetic the loop will execute slightly faster.
> >>>>
> >>>> Worth a commit?
> >>>>
> >>>> stan.
> >>>>
> >>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp      Wed Aug 04
> >>>> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
> >>>> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
> >>>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>>>> conn.h_send_qp, &mod_attr );
> >>>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>>>> conn.h_recv_qp, &mod_attr );
> >>>>
> >>>> -               for( i = 0; i < MAX_RECV_WC; i++ )
> >>>> -                       wc[i].p_next = &wc[i + 1];
> >>>> -               wc[MAX_RECV_WC - 1].p_next = NULL;
> >>>> +       for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC];
> >>>> p_free_wc++ ) +               p_free_wc->p_next = p_free_wc + 1; +
> >>>> +       (--p_free_wc)->p_next = NULL;
> >>>>
> >>>>                 do
> >>>>                 {
> >>>>
> >>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04
> 10:29:33
> >>>> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04
> >>>> 10:28:31 2010 @@ -1987,7 +1987,6 @@
> >>>>         ib_wc_t                         wc[MAX_RECV_WC], *p_free,
> >>>>         *p_wc; int32_t                         NBL_cnt, recv_cnt =
> >>>>         0, shortage, discarded; cl_qlist_t
> >>>> done_list, bad_list; -       size_t                          i;
> >>>>         ULONG                           recv_complete_flags = 0;
> >>>>         BOOLEAN                         res;
> >>>>
> >>>> @@ -2017,9 +2016,11 @@
> >>>>         cl_qlist_init( &bad_list );
> >>>>
> >>>>         ipoib_port_ref( p_port, ref_recv_cb );
> >>>> -       for( i = 0; i < MAX_RECV_WC; i++ )
> >>>> -               wc[i].p_next = &wc[i + 1];
> >>>> -       wc[MAX_RECV_WC - 1].p_next = NULL;
> >>>> +
> >>>> +       for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
> >>>> +               p_free->p_next = p_free + 1;
> >>>> +
> >>>> +       (--p_free)->p_next = NULL;
> >>>>
> >>>>         /*
> >>>>          * We'll be accessing the endpoint map so take a reference
> >>>>         @@ -5769,7 +5770,6 @@ cl_qlist_t
> >>>>         done_list; ipoib_endpt_t           *p_endpt;
> >>>>         ip_stat_sel_t           type;
> >>>> -       size_t                          i;
> >>>>         NET_BUFFER                      *p_netbuffer = NULL;
> >>>>         ipoib_send_NB_SG        *s_buf;
> >>>>
> >>>> @@ -5798,9 +5798,10 @@
> >>>>         //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
> >>>>         ipoib_port_ref( p_port, ref_send_cb );
> >>>>
> >>>> -       for( i = 0; i < MAX_SEND_WC; i++ )
> >>>> -               wc[i].p_next = &wc[i + 1];
> >>>> -       wc[MAX_SEND_WC - 1].p_next = NULL;
> >>>> +       for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
> >>>> +               p_free->p_next = p_free + 1;
> >>>> +
> >>>> +       (--p_free)->p_next = NULL;
> >>>>
> >>>>         do
> >>>>         {