[Openib-windows] [RFC] IRP-based verbs

Fab Tillier ftillier at silverstorm.com
Thu Sep 8 11:59:59 PDT 2005


> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Thursday, September 08, 2005 11:28 AM
> 
> Hi Fab,
> Before we give you comments about the proposed change there are a few things
> that we would like to :
>
> 1) Do you intend that all the kernel mode ULPs to move to an IOCTL based
> model? That is do you want the IPOIB model to create an IRP every time it is
> going to talk with IBAL?

Initially, this would be deployed just between IBAL and the HCA driver.  Over
time ULPs would be transitioned to it too.

Speed-path (post/poll) operations would still be direct-call.  I'd like to
eventually have a direct call interface similar to Microsoft's new kernel socket
model (WSK), but that would still take as input an IRP for completion
notifications.

> Does it mean that the entire interface will change?

All verbs that result in command interface calls would be issued via IOCTL.  It
would require the ULPs to change to allocate, format, and issue the IRPs.  It
allows using the I/O completion callback processing provided by the OS rather
than implementing custom callback mechanisms.  Think of it as evolving the stack
to be designed for Windows.

> 2) Currently the hardware driver doesn't support operations at dispatch mode
> (that is Create-QP will block bellow the IBAL library). As a result I don't
> understand how you are going to establish the goal of allowing all operations
> to work from dispatch level.

The HCA driver will need to be fixed.  Requiring all verbs to be issued at
passive, especially considering the command interface is asynchronous, is a
lousy design and imposes all sorts of restrictions on kernel clients.  This
design flaw requires ULPs to have passive level threads to perform work - work
which may be delayed by I/O completion processing from other clients.  It will
also enable new functionality in ULPs that do not have a flexible way to get
into a passive level thread context.

Specific examples include:
- Local mad processing, which currently requires a context switch from the CQ
callback.  In a busy system, the MAD processing may be delayed to such an extent
that the node appears unresponsive to an SM.
- Direct Data Buffer Descriptor for SRP, which can't be implemented because all
SRP data path entry points are invoked at DISPATCH.

> 3) What is the impact on the client that you see from this change? Will it
> bring higher BW? Lower latency? Increase of connection rate?

This will simplify the code base considerably by eliminating context switches
currently required to account for a bad HCA driver model (and the reference
counting those context switches require).  It also creates a common path for
kernel and user-mode verb processing, and lets clients decide how they want to
process their verbs.  In effect, it removes policy imposed by the HCA driver
(not the HW!).

There should be no negative impact on latency or bandwidth, it just takes
advantage of capabilities provided by the OS rather than duplicating
functionality, and will allow us to move to a more efficient and smaller code
base.

For IBAL as a client, it allows elimination of the destroy thread, the local mad
processing thread, passive threads for verb processing, and the asynchronous
destroy callback mechanism to name a few (I'm probably missing others).

For the HCA driver, it eliminates the protection context abstraction by allowing
the driver to use the RequestorMode of the IRP, and should allow further cleanup
of that driver.

The change to return a busy status rather than cascading destructions further
eliminates code from IBAL once ULPs transition to this new interface.

- Fab




More information about the ofw mailing list