[ofw] winverbs ND provider

Tue Jan 12 10:25:21 PST 2010

>How many threads have been running when you did the test? Only one, or
>much more?
>Can you also send the numbers that you have received?

I just sent the numbers in a separate email.

Another difference between ibal and winverbs is that winverbs uses
socket calls and reserves actual port numbers.  Still I can't imagine
that the socket calls account for that much overhead.

>Please also note that completion ports is a very good and efficient way
>to program as long as it is used alone. That is, the threads of the
>completion port should do all the work. In the design that you have
>used, there is a working thread of the program, than there is a thread
>in the completion queue. This means that after the kernel has completed
>a request, the completion queue thread wakes up sets the event and the
>next thread wakes up.

Winverbs does not add threads to user space.  In the kernel it makes use
of the system thread pool to handle QP transitions.  This was done based
on work that Fab did showing an improvement in the number of QP
transitions that could be done using an asynchronous model, versus a
synchronous call.

The use of events versus completion ports is controlled by the app.

> Given that the test program can not be changed, than I guess that if
>the program thread is waiting on the events than setting the events
>from
>the kernel is faster. (it saves the wakeup of one thread). Please also
>note the use of overlapped operations creates an IRP in the kernel
>which
>is also not that efficient.
>I guess that all this can save about 10us per operation, and I would
>like to see the number of connects that you make to understand if this
>is really the problem.

In the kernel, winverbs simply completes the IRP.  The rest is handled
by the framework and kernel.

>There are a few operations that by pass the os. For example send and
>receive but also notify. In the winverbs design this is not the case.
>That is notify does go to the kernel. See CWVCompletionQueue::Notify.
>I guess that there are two reasons why this problem is not that big:
>1) performance oriented tests don't arm the CQ, they poll.
>2) if arm_cq was called than when there is a new event the kernel will
>have to be involved.
>I understand that it will be very hard to change that, but I guess that
>for the long time we will have to think of something, since IBAL was
>able to eliminate that.

Both winverbs and ibal ND providers issue IOCTLs as part of CQ notify.
In both cases, the IRPs are queued in the kernel until a CQ event
occurs, and the IRPs are completed from the interrupt dispatch thread
handling routine.

Can you describe what difference you see between the two in more detail?

>There is another issue that one should think of: Using fast dispatch
>routines instead of the normal dispatch routines. This technique
>decreases the time it takes to move from user to kernel dramatically.
>Since the nd_conn test is moving many times from user to kernel it will
>improve things.

I'm not sure what specifically you're referring to here.  From my
understanding, winverbs should actually be faster than ibal for user to
kernel transitions.  Ibal carries the UMV buffer separately from its own
buffer and requires additional calls to copy the UMV buffer to the
kernel.  Winverbs packages the UMV buffer as part of a single buffer