[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations

Mon Aug 9 22:57:05 PDT 2010

By the way, by experiments that xalex has done in the past he saw that actually the fastest way to initialize this lists was to create the list once on another place and then do a memcopy of the result every time you need the list.

That said, I'm not sure that we want to go in that way. The best way to improve performance here is very likely to drop the linked lists and move to an array...

Thanks
Tzachi

> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Monday, August 09, 2010 11:23 PM
> To: Tzachi Dar
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> removing array index calculations
> 
> Tzachi Dar wrote:
> > Have you been actually being able to measure a difference?
> >
> > I believe that your code is better, but I wander if it really has an
> > affect that we can measure.
> 
> In the following code sequence modified to use pointers (with Sean's
> observations)
> 
> for( i = 0; i < MAX_SEND_WC; i++ )
>      wc[i].p_next = &wc[i + 1];
> wc[MAX_SEND_WC - 1].p_next = NULL;
> 
> for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
>      p_free->p_next = p_free + 1;
> p_free->p_next = NULL;
> 
> If the MS WDK compiler optimizations are really 'good', it might
> optimize the loops to basically the same instruction sequence.  I do
> not believe this to be the case.
> 
> The slightly faster execution time is based on the observation that the
> total number of instructions executed is reduced by skipping the array
> index arithmetic by use of pointers.
> 
> Since these loops live in the Tx & Rx speed paths, every little bit
> helps.
> 
> 
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: Smith, Stan [mailto:stan.smith at intel.com]
> >> Sent: Wednesday, August 04, 2010 8:43 PM
> >> To: Tzachi Dar
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> >> removing array index calculations
> >>
> >>
> >> Hello,
> >>
> >> While reading IPOIB code I noticed a minor speed enhancement in CQ
> >> callback routines. When linking WC (work complete) items into a
> >> list, by removing the array index calculations in favor of pointer
> >> arithmetic the loop will execute slightly faster.
> >>
> >> Worth a commit?
> >>
> >> stan.
> >>
> >> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp      Wed Aug 04
> >> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
> >> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
> >>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>> conn.h_send_qp, &mod_attr );
> >>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>> conn.h_recv_qp, &mod_attr );
> >>
> >> -               for( i = 0; i < MAX_RECV_WC; i++ )
> >> -                       wc[i].p_next = &wc[i + 1];
> >> -               wc[MAX_RECV_WC - 1].p_next = NULL;
> >> +       for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC]; p_free_wc++
> >> ) +               p_free_wc->p_next = p_free_wc + 1;
> >> +
> >> +       (--p_free_wc)->p_next = NULL;
> >>
> >>                 do
> >>                 {
> >>
> >> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04 10:29:33
> >> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04
> >> 10:28:31 2010 @@ -1987,7 +1987,6 @@
> >>         ib_wc_t                         wc[MAX_RECV_WC], *p_free,
> >>         *p_wc; int32_t                         NBL_cnt, recv_cnt =
> >>         0, shortage, discarded; cl_qlist_t
> >> done_list, bad_list; -       size_t                          i;
> >>         ULONG                           recv_complete_flags = 0;
> >>         BOOLEAN                         res;
> >>
> >> @@ -2017,9 +2016,11 @@
> >>         cl_qlist_init( &bad_list );
> >>
> >>         ipoib_port_ref( p_port, ref_recv_cb );
> >> -       for( i = 0; i < MAX_RECV_WC; i++ )
> >> -               wc[i].p_next = &wc[i + 1];
> >> -       wc[MAX_RECV_WC - 1].p_next = NULL;
> >> +
> >> +       for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
> >> +               p_free->p_next = p_free + 1;
> >> +
> >> +       (--p_free)->p_next = NULL;
> >>
> >>         /*
> >>          * We'll be accessing the endpoint map so take a reference
> >>         @@ -5769,7 +5770,6 @@ cl_qlist_t
> >>         done_list; ipoib_endpt_t           *p_endpt;
> >>         ip_stat_sel_t           type;
> >> -       size_t                          i;
> >>         NET_BUFFER                      *p_netbuffer = NULL;
> >>         ipoib_send_NB_SG        *s_buf;
> >>
> >> @@ -5798,9 +5798,10 @@
> >>         //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
> >>         ipoib_port_ref( p_port, ref_send_cb );
> >>
> >> -       for( i = 0; i < MAX_SEND_WC; i++ )
> >> -               wc[i].p_next = &wc[i + 1];
> >> -       wc[MAX_SEND_WC - 1].p_next = NULL;
> >> +       for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
> >> +               p_free->p_next = p_free + 1;
> >> +
> >> +       (--p_free)->p_next = NULL;
> >>
> >>         do
> >>         {