[ofw] Loading drivers on LHS
Fab Tillier
ftillier at windows.microsoft.com
Mon May 14 17:14:29 PDT 2007
Hi Jan,
The problem you're facing is independent of the interface model and
driver model (filter driver vs. root enumerated). What you need is for
the bus driver to create PDOs based on information in the registry. The
point I was trying to make was that a kernel DLL model makes driver
updates very cumbersome, and that the existing PnP interface model works
quite. Neither a PnP interface or a kernel DLL entry point solves the
problem of putting an arbitrary device in the HCA's device tree.
Note also that your example of IPoIB isn't quite right, as there is a
local physical hardware entity represented by the IPoIB PDO - the HCA's
port. What you want to do is create a PDO for a device connected to the
IB fabric without any information about the fabric topology or even
connectivity to it - something that seems distinctly non-PnP. That
said, I agree that for your case extending the bus driver to enumerate
devices from the registry makes sense, and it would be interesting to
find out how many other kernel drivers need this functionality too.
For power management, IBAL treats a power down event as an HCA removal.
This was done because the HCA driver did not handle power management.
Now that it does, the HCA should really remain registered with IBAL, and
the HCA driver should handle requests coming in when a device is powered
down. Treating HCA power transitions as device removals was a necessary
hack, but is still a hack and a nicer solution would be great.
Keep in mind too that a filter driver model does not preclude the IBAL
driver from creating a single, well-know-named device from its
DriverEntry routine or when the first HCA is added. There is a problem
with the filter driver in that the IBAL driver doesn't get loaded unless
there's an HCA enabled in the system, but this could probably be worked
around using a service that loads the driver (but this probably
introduces more driver update difficulties, I don't know for sure).
The bottom line, though, is that IBAL wasn't really designed for
Windows. It was made to work on Windows, but that's far from the same
thing. I think it would be worthwhile to think about what things are
good and what things are bad in the existing driver model, and then see
what the best path for addressing them would be. Treating issues one at
a time doesn't necessarily get to the desired end result as fast as
evaluating the problem as a whole.
-Fab
From: Jan Bottorff [mailto:jbottorff at xsigo.com]
Sent: Monday, May 14, 2007 4:41 PM
To: Fab Tillier; Yossi Leybovich; ofw at lists.openfabrics.org
Subject: RE: [ofw] Loading drivers on LHS
>For kernel clients, the existing mechanism of querying for the IBAL
interface via IRP_MN_ QUERY_INTERFACE requests
>should not be changed. This is the proper way for kernel drivers to
get one another's interfaces, and there is support
>for PnP notifications so that drivers can be unloaded and updated.
Even querying the CI interface is fine.
Me and my team have a bunch of experience being a root enumerated kernel
client of the IBAL interface and there is a real problem when it comes
to power management. A peer driver that opens the IBAL interface (and
probably registers for device interface notification to know when the
IBAL device exists) is not in the PnP hierarchy of the hca, and as a
result there is no power relationship between the hca driver and the
peer driver. This means when the system tries to standby or hibernate
(which the IB stack/hca now supports), the hca stack may get powered
down before the kernel client. As a result, on sleep events, a kernel
client has no guarantee fabric communication still works when it's
processing device power irps. The client can't cleanly shut down QP's
and the entity at the remote end of the QP, which is a problem.
Our belief is also since our client driver doesn't really talk to
hardware directly, it only issues I/O requests to the IB stack, if the
hca/IB stack powers down, the IB stack is supposed to queue any I/O
requests and when power is reenabled, process those queued requests
(which possible get an error since the remote end may now be gone). The
behavior we see is the IB stack crashes if we continue to do I/O
requests when the hca/IB stack have powered down. It seems like the IBAL
interface should act similar to the TDI interface of TCP/IP. Kernel
clients doing TCP communication aren't required to close communication
with remote endpoints on system sleep before the NIC powers down,
although some clients may want to do this (like a storage stack that
wants to flush buffers before the communication fails).
Since we have no power irp ordering relationship with the IB stack , and
the IB stack can't cope with I/O request when it's powered down, about
all we can do is make our kernel client driver fail any OS attempt at
standby/hibernate the system. At system shutdown, our kernel client
registers for last change shutdown notification, so is able to end all
communication before the hca is powered down.
Our belief is the right way to handle all this is to make our kernel
client become a PnP child of the hca/IB bus. To do this, it seems like
we will need to modify the IB stack drivers to instantiate an arbitrary
list of PnP id's (read from the registry) as IB bus children, for every
IB port, very much like the IPoIB pdo is hard coded today. This will
cause the OS to load our PnP kernel client fdo device in the PnP tree of
the hca. If we do this, the OS will then power down all the child stacks
before it powers down the hca/IB Bus. I assume other kernel clients have
the same problem.
A significant effect of making the IBAL interface driver a filter on the
hca, is that if you have multiple hca's, you'll have multiple IBAL
interface devices, which can't all have the same name. This may be a
problem for some kernel and user clients if there is no longer a single
IBAL interface device.
Having a single root IBAL interface device has always seemed problematic
for kernel clients for other reasons. One is that some IB requests use
physical addresses, which only have meaning relative to a specific hca
on a specific PCI bus. There is no guarantee that processor physical
address == hca bus physical address, so addresses may need to get mapped
at a layer that knows what happens in the PCI bridges. There is
currently an API to retrieve the actual hca device on an IB port, which
allows a kernel client to obtain the correct DMA adapter object which is
needed to correctly map virtual to physical addresses. Devices that are
instantiated as children of the hca, get the benefits of hidden API's,
like QUERY_INTERFACE for GUID_BUS_INTERFACE_STANDARD to get the DMA
adapter that matches the hca.
We would certainly like to see the IB stack structure move in a
direction that would accommodate kernel mode clients, which are not
created via IB IOC detection protocols, and allow them to manage power
states correctly.
- Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070514/6a51f2e8/attachment.html>
More information about the ofw
mailing list