[ofw] winverbs ND provider

Tue Jan 12 09:47:23 PST 2010

Please see bellow.

Thanks
Tzachi

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sean Hefty
> Sent: Tuesday, January 12, 2010 8:55 AM
> To: ofw_list
> Subject: [ofw] winverbs ND provider
> 
> I've submitted a set of 4 patches against the stack to 
> address performance differences between the IBAL and winverbs 
> providers.  (One of the patches is to fix a bug that I hit 
> while working on the other patches.)
> 
> Based on ndconn results, the winverbs ND connection rate is 
> still less than that reported for the IBAL ND provider.  I'm 
> not sure how much emphasis we should place on the ndconn 
> rates, however, or connection rates at all for that matter.
> The main difference between the providers is that winverbs 
> uses multiple threads to process connections asynchronously.  
> It's possible that this actually lowers ndconn numbers, but 
> it would take more extensive testing to confirm this.
> 
How many threads have been running when you did the test? Only one, or
much more?
Can you also send the numbers that you have received?

Please also note that completion ports is a very good and efficient way
to program as long as it is used alone. That is, the threads of the
completion port should do all the work. In the design that you have
used, there is a working thread of the program, than there is a thread
in the completion queue. This means that after the kernel has completed
a request, the completion queue thread wakes up sets the event and the
next thread wakes up.
 Given that the test program can not be changed, than I guess that if
the program thread is waiting on the events than setting the events from
the kernel is faster. (it saves the wakeup of one thread). Please also
note the use of overlapped operations creates an IRP in the kernel which
is also not that efficient.
I guess that all this can save about 10us per operation, and I would
like to see the number of connects that you make to understand if this
is really the problem.

> The winverbs ND provider was also updated to use inline data. 
>  In my testing, this gave similar latency numbers as the ibal 
> ND provider.
This is good, latency and BW are probably the most important aspects of
performance.

> If anyone is aware of other performance differences between 
> the two providers, please let me know as soon as possible.  I 
> will continue to stress the code for the next couple of days 
> before committing the changes to the trunk.
> 
There is one issue that I have noticed. This is very likely not related
to the ndconn test but it still bothers me:
There are a few operations that by pass the os. For example send and
receive but also notify. In the winverbs design this is not the case.
That is notify does go to the kernel. See CWVCompletionQueue::Notify.
I guess that there are two reasons why this problem is not that big:
1) performance oriented tests don't arm the CQ, they poll.
2) if arm_cq was called than when there is a new event the kernel will
have to be involved.
I understand that it will be very hard to change that, but I guess that
for the long time we will have to think of something, since IBAL was
able to eliminate that.

There is another issue that one should think of: Using fast dispatch
routines instead of the normal dispatch routines. This technique
decreases the time it takes to move from user to kernel dramatically.
Since the nd_conn test is moving many times from user to kernel it will
improve things.

> - Sean
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>