[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations

Mon Aug 23 10:55:25 PDT 2010

Tzachi Dar wrote:
> By the way, by experiments that xalex has done in the past he saw
> that actually the fastest way to initialize this lists was to create
> the list once on another place and then do a memcopy of the result
> every time you need the list.
>
> That said, I'm not sure that we want to go in that way. The best way
> to improve performance here is very likely to drop the linked lists
> and move to an array...

Yes - an array interface would be the best.

As a side note, using Sean's loop test pgm (modified to closer resemble ipoib wc init code), built in the WDK (x64) environment with no additional optimization switches, Sean's results are replicated:
The wc pointer loop is slightly faster than using array indexes.

MF:\Test-pgms\Loop-Timing>date /t & bin\amd64\lt.exe
Mon 08/23/2010
loops 10000000 s 95622189567 e 95636499284 f 14318180
f1 0.99941
f2 0.78953

where f1 ==
        for (i = 0; i < LOOPS; i++) {
                struct ibv_wc wc[MAX_SEND_WC];
                int i;

                for (i = 0; i < MAX_SEND_WC; i++) {
                        wc[i].next = &wc[i + 1];
                }
                wc[MAX_SEND_WC - 1].next = NULL;
        }

f2 ==
        for (i = 0; i < LOOPS; i++) {
                struct ibv_wc wc[MAX_SEND_WC];
                struct ibv_wc *p;

                for (p = wc; p < &wc[MAX_SEND_WC - 1]; p++ ) {
                        p->next = p + 1;
                }
                p->next = NULL;
        }

I beleve the ipoib loop performance patch is valid to commit, although not critical.

stan.

>
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Monday, August 09, 2010 11:23 PM
>> To: Tzachi Dar
>> Cc: ofw at lists.openfabrics.org
>> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
>> removing array index calculations
>>
>> Tzachi Dar wrote:
>>> Have you been actually being able to measure a difference?
>>>
>>> I believe that your code is better, but I wander if it really has an
>>> affect that we can measure.
>>
>> In the following code sequence modified to use pointers (with Sean's
>> observations)
>>
>> for( i = 0; i < MAX_SEND_WC; i++ )
>>      wc[i].p_next = &wc[i + 1];
>> wc[MAX_SEND_WC - 1].p_next = NULL;
>>
>> for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
>>      p_free->p_next = p_free + 1;
>> p_free->p_next = NULL;
>>
>> If the MS WDK compiler optimizations are really 'good', it might
>> optimize the loops to basically the same instruction sequence.  I do
>> not believe this to be the case.
>>
>> The slightly faster execution time is based on the observation that
>> the total number of instructions executed is reduced by skipping the
>> array index arithmetic by use of pointers.
>>
>> Since these loops live in the Tx & Rx speed paths, every little bit
>> helps.
>>
>>
>>>
>>> Thanks
>>> Tzachi
>>>
>>>> -----Original Message-----
>>>> From: Smith, Stan [mailto:stan.smith at intel.com]
>>>> Sent: Wednesday, August 04, 2010 8:43 PM
>>>> To: Tzachi Dar
>>>> Cc: ofw at lists.openfabrics.org
>>>> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
>>>> removing array index calculations
>>>>
>>>>
>>>> Hello,
>>>>
>>>> While reading IPOIB code I noticed a minor speed enhancement in CQ
>>>> callback routines. When linking WC (work complete) items into a
>>>> list, by removing the array index calculations in favor of pointer
>>>> arithmetic the loop will execute slightly faster.
>>>>
>>>> Worth a commit?
>>>>
>>>> stan.
>>>>
>>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp      Wed Aug 04
>>>> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
>>>> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
>>>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>>>> conn.h_send_qp, &mod_attr );
>>>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>>>> conn.h_recv_qp, &mod_attr );
>>>>
>>>> -               for( i = 0; i < MAX_RECV_WC; i++ )
>>>> -                       wc[i].p_next = &wc[i + 1];
>>>> -               wc[MAX_RECV_WC - 1].p_next = NULL;
>>>> +       for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC];
>>>> p_free_wc++ ) +               p_free_wc->p_next = p_free_wc + 1; +
>>>> +       (--p_free_wc)->p_next = NULL;
>>>>
>>>>                 do
>>>>                 {
>>>>
>>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04 10:29:33
>>>> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04
>>>> 10:28:31 2010 @@ -1987,7 +1987,6 @@
>>>>         ib_wc_t                         wc[MAX_RECV_WC], *p_free,
>>>>         *p_wc; int32_t                         NBL_cnt, recv_cnt =
>>>>         0, shortage, discarded; cl_qlist_t
>>>> done_list, bad_list; -       size_t                          i;
>>>>         ULONG                           recv_complete_flags = 0;
>>>>         BOOLEAN                         res;
>>>>
>>>> @@ -2017,9 +2016,11 @@
>>>>         cl_qlist_init( &bad_list );
>>>>
>>>>         ipoib_port_ref( p_port, ref_recv_cb );
>>>> -       for( i = 0; i < MAX_RECV_WC; i++ )
>>>> -               wc[i].p_next = &wc[i + 1];
>>>> -       wc[MAX_RECV_WC - 1].p_next = NULL;
>>>> +
>>>> +       for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
>>>> +               p_free->p_next = p_free + 1;
>>>> +
>>>> +       (--p_free)->p_next = NULL;
>>>>
>>>>         /*
>>>>          * We'll be accessing the endpoint map so take a reference
>>>>         @@ -5769,7 +5770,6 @@ cl_qlist_t
>>>>         done_list; ipoib_endpt_t           *p_endpt;
>>>>         ip_stat_sel_t           type;
>>>> -       size_t                          i;
>>>>         NET_BUFFER                      *p_netbuffer = NULL;
>>>>         ipoib_send_NB_SG        *s_buf;
>>>>
>>>> @@ -5798,9 +5798,10 @@
>>>>         //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
>>>>         ipoib_port_ref( p_port, ref_send_cb );
>>>>
>>>> -       for( i = 0; i < MAX_SEND_WC; i++ )
>>>> -               wc[i].p_next = &wc[i + 1];
>>>> -       wc[MAX_SEND_WC - 1].p_next = NULL;
>>>> +       for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
>>>> +               p_free->p_next = p_free + 1;
>>>> +
>>>> +       (--p_free)->p_next = NULL;
>>>>
>>>>         do
>>>>         {

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lt.c
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20100823/9185606c/attachment.c>