[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations
Tzachi Dar
tzachid at mellanox.co.il
Wed Aug 25 05:01:24 PDT 2010
Please commit.
Thanks
Tzachi
> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Monday, August 23, 2010 8:55 PM
> To: Tzachi Dar; Hefty, Sean; Alex Naslednikov
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> removing array index calculations
>
> Tzachi Dar wrote:
> > By the way, by experiments that xalex has done in the past he saw
> that
> > actually the fastest way to initialize this lists was to create the
> > list once on another place and then do a memcopy of the result every
> > time you need the list.
> >
> > That said, I'm not sure that we want to go in that way. The best way
> > to improve performance here is very likely to drop the linked lists
> > and move to an array...
>
> Yes - an array interface would be the best.
>
> As a side note, using Sean's loop test pgm (modified to closer resemble
> ipoib wc init code), built in the WDK (x64) environment with no
> additional optimization switches, Sean's results are replicated:
> The wc pointer loop is slightly faster than using array indexes.
>
> MF:\Test-pgms\Loop-Timing>date /t & bin\amd64\lt.exe Mon 08/23/2010
> loops 10000000 s 95622189567 e 95636499284 f 14318180
> f1 0.99941
> f2 0.78953
>
> where f1 ==
> for (i = 0; i < LOOPS; i++) {
> struct ibv_wc wc[MAX_SEND_WC];
> int i;
>
> for (i = 0; i < MAX_SEND_WC; i++) {
> wc[i].next = &wc[i + 1];
> }
> wc[MAX_SEND_WC - 1].next = NULL;
> }
>
> f2 ==
> for (i = 0; i < LOOPS; i++) {
> struct ibv_wc wc[MAX_SEND_WC];
> struct ibv_wc *p;
>
> for (p = wc; p < &wc[MAX_SEND_WC - 1]; p++ ) {
> p->next = p + 1;
> }
> p->next = NULL;
> }
>
> I beleve the ipoib loop performance patch is valid to commit, although
> not critical.
>
> stan.
>
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: Smith, Stan [mailto:stan.smith at intel.com]
> >> Sent: Monday, August 09, 2010 11:23 PM
> >> To: Tzachi Dar
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> >> removing array index calculations
> >>
> >> Tzachi Dar wrote:
> >>> Have you been actually being able to measure a difference?
> >>>
> >>> I believe that your code is better, but I wander if it really has
> an
> >>> affect that we can measure.
> >>
> >> In the following code sequence modified to use pointers (with Sean's
> >> observations)
> >>
> >> for( i = 0; i < MAX_SEND_WC; i++ )
> >> wc[i].p_next = &wc[i + 1];
> >> wc[MAX_SEND_WC - 1].p_next = NULL;
> >>
> >> for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
> >> p_free->p_next = p_free + 1;
> >> p_free->p_next = NULL;
> >>
> >> If the MS WDK compiler optimizations are really 'good', it might
> >> optimize the loops to basically the same instruction sequence. I do
> >> not believe this to be the case.
> >>
> >> The slightly faster execution time is based on the observation that
> >> the total number of instructions executed is reduced by skipping the
> >> array index arithmetic by use of pointers.
> >>
> >> Since these loops live in the Tx & Rx speed paths, every little bit
> >> helps.
> >>
> >>
> >>>
> >>> Thanks
> >>> Tzachi
> >>>
> >>>> -----Original Message-----
> >>>> From: Smith, Stan [mailto:stan.smith at intel.com]
> >>>> Sent: Wednesday, August 04, 2010 8:43 PM
> >>>> To: Tzachi Dar
> >>>> Cc: ofw at lists.openfabrics.org
> >>>> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
> >>>> removing array index calculations
> >>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>> While reading IPOIB code I noticed a minor speed enhancement in CQ
> >>>> callback routines. When linking WC (work complete) items into a
> >>>> list, by removing the array index calculations in favor of pointer
> >>>> arithmetic the loop will execute slightly faster.
> >>>>
> >>>> Worth a commit?
> >>>>
> >>>> stan.
> >>>>
> >>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp Wed Aug 04
> >>>> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
> >>>> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
> >>>> p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>>>> conn.h_send_qp, &mod_attr );
> >>>> p_port->p_adapter->p_ifc->modify_qp( p_endpt-
> >>>>> conn.h_recv_qp, &mod_attr );
> >>>>
> >>>> - for( i = 0; i < MAX_RECV_WC; i++ )
> >>>> - wc[i].p_next = &wc[i + 1];
> >>>> - wc[MAX_RECV_WC - 1].p_next = NULL;
> >>>> + for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC];
> >>>> p_free_wc++ ) + p_free_wc->p_next = p_free_wc + 1; +
> >>>> + (--p_free_wc)->p_next = NULL;
> >>>>
> >>>> do
> >>>> {
> >>>>
> >>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Wed Aug 04
> 10:29:33
> >>>> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Wed Aug 04
> >>>> 10:28:31 2010 @@ -1987,7 +1987,6 @@
> >>>> ib_wc_t wc[MAX_RECV_WC], *p_free,
> >>>> *p_wc; int32_t NBL_cnt, recv_cnt =
> >>>> 0, shortage, discarded; cl_qlist_t
> >>>> done_list, bad_list; - size_t i;
> >>>> ULONG recv_complete_flags = 0;
> >>>> BOOLEAN res;
> >>>>
> >>>> @@ -2017,9 +2016,11 @@
> >>>> cl_qlist_init( &bad_list );
> >>>>
> >>>> ipoib_port_ref( p_port, ref_recv_cb );
> >>>> - for( i = 0; i < MAX_RECV_WC; i++ )
> >>>> - wc[i].p_next = &wc[i + 1];
> >>>> - wc[MAX_RECV_WC - 1].p_next = NULL;
> >>>> +
> >>>> + for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
> >>>> + p_free->p_next = p_free + 1;
> >>>> +
> >>>> + (--p_free)->p_next = NULL;
> >>>>
> >>>> /*
> >>>> * We'll be accessing the endpoint map so take a reference
> >>>> @@ -5769,7 +5770,6 @@ cl_qlist_t
> >>>> done_list; ipoib_endpt_t *p_endpt;
> >>>> ip_stat_sel_t type;
> >>>> - size_t i;
> >>>> NET_BUFFER *p_netbuffer = NULL;
> >>>> ipoib_send_NB_SG *s_buf;
> >>>>
> >>>> @@ -5798,9 +5798,10 @@
> >>>> //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
> >>>> ipoib_port_ref( p_port, ref_send_cb );
> >>>>
> >>>> - for( i = 0; i < MAX_SEND_WC; i++ )
> >>>> - wc[i].p_next = &wc[i + 1];
> >>>> - wc[MAX_SEND_WC - 1].p_next = NULL;
> >>>> + for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
> >>>> + p_free->p_next = p_free + 1;
> >>>> +
> >>>> + (--p_free)->p_next = NULL;
> >>>>
> >>>> do
> >>>> {
More information about the ofw
mailing list