[Openib-windows] [RFC] IRP-based verbs

Tzachi Dar tzachid at mellanox.co.il
Fri Sep 9 04:10:23 PDT 2005


Hi fab,

Some more questions and comments bellow.

Thanks
Tzachi

>-----Original Message-----
>From: Fab Tillier [mailto:ftillier at silverstorm.com]
>Sent: Thursday, September 08, 2005 10:00 PM
>To: 'Tzachi Dar'; openib-windows at openib.org
>Subject: RE: [Openib-windows] [RFC] IRP-based verbs
>
>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> Sent: Thursday, September 08, 2005 11:28 AM
>>
>> Hi Fab,
>> Before we give you comments about the proposed change there are a few
>things
>> that we would like to :
>>
>> 1) Do you intend that all the kernel mode ULPs to move to an IOCTL based
>> model? That is do you want the IPOIB model to create an IRP every time it
>is
>> going to talk with IBAL?
>
>Initially, this would be deployed just between IBAL and the HCA driver.
>Over
>time ULPs would be transitioned to it too.
>
>Speed-path (post/poll) operations would still be direct-call.  I'd like to
>eventually have a direct call interface similar to Microsoft's new kernel
>socket
>model (WSK), but that would still take as input an IRP for completion
>notifications.
>
The IOCTL interface is a very complicated interface that is mainly used for
communicating between user mode and kernel applications. Since it is so
complicated there will probably by wrapper functions around it, so I believe
that there will be no need to use the IOCTLs (at all) for communicating in
the kernel. As for the user mode, we can create a different library that
will be used to communicate with the kernel, it will use IOCTLs but these
IOCTLs will be translated to the regular "kernel" interface at a high level.
Please note that if we anticipate stress on the calls from user to kernel,
than IOCTLs are not the best solution even when passing from user to kernel.
We can of course extend this talk, if we see that there is need.


>> Does it mean that the entire interface will change?
>
>All verbs that result in command interface calls would be issued via IOCTL.
>It
>would require the ULPs to change to allocate, format, and issue the IRPs.
>It
>allows using the I/O completion callback processing provided by the OS
>rather
>than implementing custom callback mechanisms.  Think of it as evolving the
>stack
>to be designed for Windows.
>
If you look at the windows components that are "available" to the public,
you will find out that they don't use IOCTLS to communicate one with each
other.

>> 2) Currently the hardware driver doesn't support operations at dispatch
>mode
>> (that is Create-QP will block bellow the IBAL library). As a result I
>don't
>> understand how you are going to establish the goal of allowing all
>operations
>> to work from dispatch level.
>
>The HCA driver will need to be fixed.  Requiring all verbs to be issued at
>passive, especially considering the command interface is asynchronous, is a
>lousy design and imposes all sorts of restrictions on kernel clients.  This
>design flaw requires ULPs to have passive level threads to perform work -
>work
>which may be delayed by I/O completion processing from other clients.  It
>will
>also enable new functionality in ULPs that do not have a flexible way to
>get
>into a passive level thread context.
>
>Specific examples include:
>- Local mad processing, which currently requires a context switch from the
>CQ
>callback.  In a busy system, the MAD processing may be delayed to such an
>extent
>that the node appears unresponsive to an SM.
>- Direct Data Buffer Descriptor for SRP, which can't be implemented because
>all
>SRP data path entry points are invoked at DISPATCH.
>
The current version of the driver doesn't support this. The next version of
the driver will also not support this (at  least this is the plan for now).
How are you going to work over this problem?
One more issue to notice is that although the command interface allows for
calls to be issued at dispatch level, it doesn't allow infinite requests
simultaneously. This requires locks, queues and so on which will move the
complication to other places.

>> 3) What is the impact on the client that you see from this change? Will
>it
>> bring higher BW? Lower latency? Increase of connection rate?
>
>This will simplify the code base considerably by eliminating context
>switches
>currently required to account for a bad HCA driver model (and the reference
>counting those context switches require).  It also creates a common path
>for
>kernel and user-mode verb processing, and lets clients decide how they want
>to
>process their verbs.  In effect, it removes policy imposed by the HCA
>driver
>(not the HW!).
>
>There should be no negative impact on latency or bandwidth, it just takes
>advantage of capabilities provided by the OS rather than duplicating
>functionality, and will allow us to move to a more efficient and smaller
>code
>base.
>
>For IBAL as a client, it allows elimination of the destroy thread, the
>local mad
>processing thread, passive threads for verb processing, and the
>asynchronous
>destroy callback mechanism to name a few (I'm probably missing others).
>
>For the HCA driver, it eliminates the protection context abstraction by
>allowing
>the driver to use the RequestorMode of the IRP, and should allow further
>cleanup
>of that driver.
>
>The change to return a busy status rather than cascading destructions
>further
>eliminates code from IBAL once ULPs transition to this new interface.
>
I believe that although the new model will allow removing some of the
threads that you have mentioned there will probably be other instead of
that. I also believe that using reference count on the objects will help
making the destruction of objects simpler. We have to make sure that we have
a complete design before we start coding.

>- Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050909/8820c62d/attachment.html>


More information about the ofw mailing list