[ofw] winverbs ND provider

Fab Tillier ftillier at microsoft.com
Tue Jan 12 10:29:00 PST 2010


Hi Tzachi

Tzachi Dar wrote on Tue, 12 Jan 2010 at 09:47:23

> In the design that you have
> used, there is a working thread of the program, than there is a thread
> in the completion queue. This means that after the kernel has completed
> a request, the completion queue thread wakes up sets the event and the
> next thread wakes up.

I'm not sure I follow - there are threads in the WinVerbs ND provider?
 
> Please also
> note the use of overlapped operations creates an IRP in the kernel which
> is also not that efficient. I guess that all this can save about 10us
> per operation, and I would like to see the number of connects that you
> make to understand if this is really the problem.

This is no different than the IBAL provider, though.

>> If anyone is aware of other performance differences between
>> the two providers, please let me know as soon as possible.  I
>> will continue to stress the code for the next couple of days
>> before committing the changes to the trunk.
>> 
> There is one issue that I have noticed. This is very likely not related
> to the ndconn test but it still bothers me:
> There are a few operations that by pass the os. For example send and
> receive but also notify. In the winverbs design this is not the case.
> That is notify does go to the kernel. See CWVCompletionQueue::Notify.

Notify must go to the kernel - it's an async operation.  It goes to the kernel with the IBAL ND provider too.

> I guess that there are two reasons why this problem is not that big:
> 1) performance oriented tests don't arm the CQ, they poll.
> 2) if arm_cq was called than when there is a new event the kernel will
> have to be involved.
> I understand that it will be very hard to change that, but I guess that
> for the long time we will have to think of something, since IBAL was
> able to eliminate that.

How was IBAL able to eliminate kernel transitions for CQ notifications?

> There is another issue that one should think of: Using fast dispatch
> routines instead of the normal dispatch routines. This technique
> decreases the time it takes to move from user to kernel dramatically.
> Since the nd_conn test is moving many times from user to kernel it will
> improve things.

The IBAL ND provider doesn't do this, though, so the perf difference can't be related to fast dispatch.  Further, the operations that are performed by the async operations (Connect, CompleteConnect, Accept, Disconnect) are all asynchronous.  The latter three perform QP state modifications, which could thus never be handled in a fast dispatch routine.  The only call that might benefit from a fast dispatch handler is GetConnectionRequest, since there may be a request there already.  I don't know if the fast dispatch routines are beneficial when using I/O completion ports, though.

-Fab 




More information about the ofw mailing list