[ofw] Loading drivers on LHS

Jan Bottorff jbottorff at xsigo.com
Mon May 14 16:41:10 PDT 2007


>For kernel clients, the existing mechanism of querying for the IBAL
interface via IRP_MN_ QUERY_INTERFACE requests 

>should not be changed.  This is the proper way for kernel drivers to
get one another's interfaces, and there is support

>for PnP notifications so that drivers can be unloaded and updated.
Even querying the CI interface is fine.

 

Me and my team have a bunch of experience being a root enumerated kernel
client of the IBAL interface and there is a real problem when it comes
to power management. A peer driver that opens the IBAL interface (and
probably registers for device interface notification to know when the
IBAL device exists) is not in the PnP hierarchy of the hca, and as a
result there is no power relationship between the hca driver and the
peer driver. This means when the system tries to standby or hibernate
(which the IB stack/hca now supports), the hca stack may get powered
down before the kernel client. As a result, on sleep events, a kernel
client has no guarantee fabric communication still works when it's
processing device power irps. The client can't cleanly shut down QP's
and the entity at the remote end of the QP, which is a problem. 

 

Our belief is also since our client driver doesn't really talk to
hardware directly, it only issues I/O requests to the IB stack, if the
hca/IB stack powers down, the IB stack is supposed to queue any I/O
requests and when power is reenabled, process those queued requests
(which possible get an error since the remote end may now be gone). The
behavior we see is the IB stack crashes if we continue to do I/O
requests when the hca/IB stack have powered down. It seems like the IBAL
interface should act similar to the TDI interface of TCP/IP. Kernel
clients doing TCP communication aren't required to close communication
with remote endpoints on system sleep before the NIC powers down,
although some clients may want to do this (like a storage stack that
wants to flush buffers before the communication fails).

 

Since we have no power irp ordering relationship with the IB stack , and
the IB stack can't cope with I/O request when it's powered down, about
all we can do is make our kernel client driver fail any OS attempt at
standby/hibernate the system. At system shutdown, our kernel client
registers for last change shutdown notification, so is able to end all
communication before the hca is powered down.

 

Our belief is the right way to handle all this is to make our kernel
client become a PnP child of the hca/IB bus. To do this, it seems like
we will need to modify the IB stack drivers to instantiate an arbitrary
list of PnP id's (read from the registry) as IB bus children, for every
IB port, very much like the IPoIB pdo is hard coded today. This will
cause the OS to load our PnP kernel client fdo device in the PnP tree of
the hca. If we do this, the OS will then power down all the child stacks
before it powers down the hca/IB Bus. I assume other kernel clients have
the same problem. 

 

A significant effect of making the IBAL interface driver a filter on the
hca, is that if you have multiple hca's, you'll have multiple IBAL
interface devices, which can't all have the same name. This may be a
problem for some kernel and user clients if there is no longer a single
IBAL interface device. 

 

Having a single root IBAL interface device has always seemed problematic
for kernel clients for other reasons. One is that some IB requests use
physical addresses, which only have meaning relative to a specific hca
on a specific PCI bus. There is no guarantee that processor physical
address == hca bus physical address, so addresses may need to get mapped
at a layer that knows what happens in the PCI bridges. There is
currently an API to retrieve the actual hca device on an IB port, which
allows a kernel client to obtain the correct DMA adapter object which is
needed to correctly map virtual to physical addresses. Devices that are
instantiated as children of the hca, get the benefits of hidden API's,
like QUERY_INTERFACE for GUID_BUS_INTERFACE_STANDARD to get the DMA
adapter that matches the hca. 

 

We would certainly like to see the IB stack structure move in a
direction that would accommodate kernel mode clients, which are not
created via IB IOC detection protocols, and allow them to manage power
states correctly. 

 

- Jan

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070514/52878bba/attachment.html>


More information about the ofw mailing list