[ofw] [IPOIB_NDIS6_CM] enhance wc linking loop performance by removing array index calculations
Smith, Stan
stan.smith at intel.com
Mon Aug 23 10:55:25 PDT 2010
Tzachi Dar wrote:
> By the way, by experiments that xalex has done in the past he saw
> that actually the fastest way to initialize this lists was to create
> the list once on another place and then do a memcopy of the result
> every time you need the list.
>
> That said, I'm not sure that we want to go in that way. The best way
> to improve performance here is very likely to drop the linked lists
> and move to an array...
Yes - an array interface would be the best.
As a side note, using Sean's loop test pgm (modified to closer resemble ipoib wc init code), built in the WDK (x64) environment with no additional optimization switches, Sean's results are replicated:
The wc pointer loop is slightly faster than using array indexes.
MF:\Test-pgms\Loop-Timing>date /t & bin\amd64\lt.exe
Mon 08/23/2010
loops 10000000 s 95622189567 e 95636499284 f 14318180
f1 0.99941
f2 0.78953
where f1 ==
for (i = 0; i < LOOPS; i++) {
struct ibv_wc wc[MAX_SEND_WC];
int i;
for (i = 0; i < MAX_SEND_WC; i++) {
wc[i].next = &wc[i + 1];
}
wc[MAX_SEND_WC - 1].next = NULL;
}
f2 ==
for (i = 0; i < LOOPS; i++) {
struct ibv_wc wc[MAX_SEND_WC];
struct ibv_wc *p;
for (p = wc; p < &wc[MAX_SEND_WC - 1]; p++ ) {
p->next = p + 1;
}
p->next = NULL;
}
I beleve the ipoib loop performance patch is valid to commit, although not critical.
stan.
>
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Monday, August 09, 2010 11:23 PM
>> To: Tzachi Dar
>> Cc: ofw at lists.openfabrics.org
>> Subject: RE: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
>> removing array index calculations
>>
>> Tzachi Dar wrote:
>>> Have you been actually being able to measure a difference?
>>>
>>> I believe that your code is better, but I wander if it really has an
>>> affect that we can measure.
>>
>> In the following code sequence modified to use pointers (with Sean's
>> observations)
>>
>> for( i = 0; i < MAX_SEND_WC; i++ )
>> wc[i].p_next = &wc[i + 1];
>> wc[MAX_SEND_WC - 1].p_next = NULL;
>>
>> for( p_free=wc; p_free < &wc[MAX_SEND_WC - 1]; p_free++ )
>> p_free->p_next = p_free + 1;
>> p_free->p_next = NULL;
>>
>> If the MS WDK compiler optimizations are really 'good', it might
>> optimize the loops to basically the same instruction sequence. I do
>> not believe this to be the case.
>>
>> The slightly faster execution time is based on the observation that
>> the total number of instructions executed is reduced by skipping the
>> array index arithmetic by use of pointers.
>>
>> Since these loops live in the Tx & Rx speed paths, every little bit
>> helps.
>>
>>
>>>
>>> Thanks
>>> Tzachi
>>>
>>>> -----Original Message-----
>>>> From: Smith, Stan [mailto:stan.smith at intel.com]
>>>> Sent: Wednesday, August 04, 2010 8:43 PM
>>>> To: Tzachi Dar
>>>> Cc: ofw at lists.openfabrics.org
>>>> Subject: [IPOIB_NDIS6_CM] enhance wc linking loop performance by
>>>> removing array index calculations
>>>>
>>>>
>>>> Hello,
>>>>
>>>> While reading IPOIB code I noticed a minor speed enhancement in CQ
>>>> callback routines. When linking WC (work complete) items into a
>>>> list, by removing the array index calculations in favor of pointer
>>>> arithmetic the loop will execute slightly faster.
>>>>
>>>> Worth a commit?
>>>>
>>>> stan.
>>>>
>>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp Wed Aug 04
>>>> 10:30:43 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_endpoint.cpp
>>>> Wed Aug 04 10:28:59 2010 @@ -888,9 +888,10 @@
>>>> p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>>>> conn.h_send_qp, &mod_attr );
>>>> p_port->p_adapter->p_ifc->modify_qp( p_endpt-
>>>>> conn.h_recv_qp, &mod_attr );
>>>>
>>>> - for( i = 0; i < MAX_RECV_WC; i++ )
>>>> - wc[i].p_next = &wc[i + 1];
>>>> - wc[MAX_RECV_WC - 1].p_next = NULL;
>>>> + for( p_free_wc=wc; p_free_wc < &wc[MAX_RECV_WC];
>>>> p_free_wc++ ) + p_free_wc->p_next = p_free_wc + 1; +
>>>> + (--p_free_wc)->p_next = NULL;
>>>>
>>>> do
>>>> {
>>>>
>>>> --- A/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Wed Aug 04 10:29:33
>>>> 2010 +++ B/ulp/ipoib_NDIS6_CM/kernel/ipoib_port.cpp Wed Aug 04
>>>> 10:28:31 2010 @@ -1987,7 +1987,6 @@
>>>> ib_wc_t wc[MAX_RECV_WC], *p_free,
>>>> *p_wc; int32_t NBL_cnt, recv_cnt =
>>>> 0, shortage, discarded; cl_qlist_t
>>>> done_list, bad_list; - size_t i;
>>>> ULONG recv_complete_flags = 0;
>>>> BOOLEAN res;
>>>>
>>>> @@ -2017,9 +2016,11 @@
>>>> cl_qlist_init( &bad_list );
>>>>
>>>> ipoib_port_ref( p_port, ref_recv_cb );
>>>> - for( i = 0; i < MAX_RECV_WC; i++ )
>>>> - wc[i].p_next = &wc[i + 1];
>>>> - wc[MAX_RECV_WC - 1].p_next = NULL;
>>>> +
>>>> + for( p_free=wc; p_free < &wc[MAX_RECV_WC]; p_free++ )
>>>> + p_free->p_next = p_free + 1;
>>>> +
>>>> + (--p_free)->p_next = NULL;
>>>>
>>>> /*
>>>> * We'll be accessing the endpoint map so take a reference
>>>> @@ -5769,7 +5770,6 @@ cl_qlist_t
>>>> done_list; ipoib_endpt_t *p_endpt;
>>>> ip_stat_sel_t type;
>>>> - size_t i;
>>>> NET_BUFFER *p_netbuffer = NULL;
>>>> ipoib_send_NB_SG *s_buf;
>>>>
>>>> @@ -5798,9 +5798,10 @@
>>>> //cl_qlist_check_validity(&p_port->send_mgr.pending_list);
>>>> ipoib_port_ref( p_port, ref_send_cb );
>>>>
>>>> - for( i = 0; i < MAX_SEND_WC; i++ )
>>>> - wc[i].p_next = &wc[i + 1];
>>>> - wc[MAX_SEND_WC - 1].p_next = NULL;
>>>> + for( p_free=wc; p_free < &wc[MAX_SEND_WC]; p_free++ )
>>>> + p_free->p_next = p_free + 1;
>>>> +
>>>> + (--p_free)->p_next = NULL;
>>>>
>>>> do
>>>> {
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lt.c
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20100823/9185606c/attachment.c>
More information about the ofw
mailing list