[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations

Mon Aug 9 13:22:52 PDT 2010

Tzachi Dar wrote:
> Have you been actually being able to measure a difference?
>
> I believe that your code is better, but I wander if it really has an
> affect that we can measure.

In the following code sequence modified to use pointers (with Sean's observations)

for( i = 0; i < MAX_SEND_WC; i++ )
     wc[i].p_next = &wc[i + 1];
wc[MAX_SEND_WC - 1].p_next = NULL;

for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
     p_free->p_next = p_free + 1;
p_free->p_next = NULL;

If the MS WDK compiler optimizations are really 'good', it might optimize the loops to basically the same instruction sequence.  I do not believe this to be the case.

The slightly faster execution time is based on the observation that the total number of instructions executed is reduced by skipping the array index arithmetic by use of pointers.

Since these loops live in the Tx & Rx speed paths, every little bit helps.

>
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Wednesday, August 04, 2010 8:43 PM
>> To: Tzachi Dar
>> Cc: ofw at lists.openfabrics.org
>> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
>> removing array index calculations
>>
>>
>> Hello,
>>
>> While reading IPOIB code I noticed a minor speed enhancement in CQ
>> callback routines. When linking WC (work complete) items into a
>> list, by removing the array index calculations in favor of pointer
>> arithmetic the loop will execute slightly faster.
>>
>> Worth a commit?
>>
>> stan.
>>
>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp      Wed Aug 04
>> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
>> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>> conn.h_send_qp, &mod_attr );
>>                 p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>> conn.h_recv_qp, &mod_attr );
>>
>> -               for( i = 0; i < MAX_RECV_WC; i++ )
>> -                       wc[i].p_next = &wc[i + 1];
>> -               wc[MAX_RECV_WC - 1].p_next = NULL;
>> +       for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC]; p_free_wc++
>> ) +               p_free_wc->p_next = p_free_wc + 1;
>> +
>> +       (--p_free_wc)->p_next = NULL;
>>
>>                 do
>>                 {
>>
>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04 10:29:33
>> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp  Wed Aug 04
>> 10:28:31 2010 @@ -1987,7 +1987,6 @@
>>         ib_wc_t                         wc[MAX_RECV_WC], *p_free,
>>         *p_wc; int32_t                         NBL_cnt, recv_cnt =
>>         0, shortage, discarded; cl_qlist_t
>> done_list, bad_list; -       size_t                          i;
>>         ULONG                           recv_complete_flags = 0;
>>         BOOLEAN                         res;
>>
>> @@ -2017,9 +2016,11 @@
>>         cl_qlist_init( &bad_list );
>>
>>         ipoib_port_ref( p_port, ref_recv_cb );
>> -       for( i = 0; i < MAX_RECV_WC; i++ )
>> -               wc[i].p_next = &wc[i + 1];
>> -       wc[MAX_RECV_WC - 1].p_next = NULL;
>> +
>> +       for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
>> +               p_free->p_next = p_free + 1;
>> +
>> +       (--p_free)->p_next = NULL;
>>
>>         /*
>>          * We'll be accessing the endpoint map so take a reference
>>         @@ -5769,7 +5770,6 @@ cl_qlist_t
>>         done_list; ipoib_endpt_t           *p_endpt;
>>         ip_stat_sel_t           type;
>> -       size_t                          i;
>>         NET_BUFFER                      *p_netbuffer = NULL;
>>         ipoib_send_NB_SG        *s_buf;
>>
>> @@ -5798,9 +5798,10 @@
>>         //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
>>         ipoib_port_ref( p_port, ref_send_cb );
>>
>> -       for( i = 0; i < MAX_SEND_WC; i++ )
>> -               wc[i].p_next = &wc[i + 1];
>> -       wc[MAX_SEND_WC - 1].p_next = NULL;
>> +       for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
>> +               p_free->p_next = p_free + 1;
>> +
>> +       (--p_free)->p_next = NULL;
>>
>>         do
>>         {