[Openib-windows] Using of fast mutexes in WinIb

Thu Nov 17 03:05:43 PST 2005

> Aren't drivers expected to check in the I/O completion 
> callback if the IRP was marked pending and if not delay 
> processing until the IoCallDriver call unwinds?
> If the IRP was marked pending isn't issuing the next I/O 
> request form the I/O completion callback is OK?

It's really amazing sometimes how few instructions might get executed in
a thread between interruptions. For example, you call write dispatch,
and a driver takes out a spinlock (raising you to DISPATCH_LEVEL) while
he updates a hardware memory ring buffer. In the meantime, some other
device interrupted and queued a DPC. Your dispatch routine finishes his
ring update and releases the spinlock, which because of the queued DPC
causes your dispatch routine to be interrupted by the other DPC. If that
DPC runs for a few hundred microseconds, your I/O may complete before
you ever get a chance to return from KeReleaseSpinLock, and you end up
running your own DPC while still in your dispatch routine. Your DPC
calls IoCompleteRequest (or other completion callback).

I have to think about what you said about it being OK to issue I/O in
the completion routine if an IRP is marked pending. Let's take the
example above, it seems like the driver would have to mark the IRP
pending BEFORE initiating the I/O to the hardware, assuming the I/O
completion might run on a different processor and the spinlock didn't
protect completion processing. If it waited until after it initiated
I/O, the IRP might be GONE by the time the CPU can mark it pending. 

So lets assume we're on a single CPU system, and the IRP is marked
pending before I/O is initiated. In the above scenerio, the completion
routine would be called while still in the dispatch routine, and it
seems like the IRP would be marked pending, as the dispatch routine was
returning pending, even though the I/O has completed already.
IoCompleteRequest doesn't clear the IRP pending flag, and the only way
we can be in the IRP completion routine is if IoCompleteRequest was
called. It seems like, the IRP pending flag has no effect on the above
scenerio, and we could nest into multiple initiate/complete pairs,
potentially causing stack overflow. 

There was an article in OSR's NT Insider about this very issue in the
last 6 months. I'll try to hunt it down.

I have to think about this some more, and understand again what the
pending flag is useful for. It seems like I have to relearn it about
every 2 years. I think it was only a year ago that I went though very
extensive research on IRP completion, and found a bug in the Microsoft
KB examples on IRP completion. It's perhaps reassuing that not only is
it a confusing topic to me and many developers, it's a confusing topic
to Microsoft developers who write the examples that are supposed to be
correct. 

> One of the main reasons I am pushing for things to work at 
> DISPATCH_LEVEL is to allow kernel clients that get called by 
> a port driver (like StorPort, for
> example) to make verb calls from the context of their entry 
> points, rather than having to find a way to get into a thread 
> context capable of talking to the HCA.

That's an ugly scenerio I hadn't thought about. I did think about
needing to call fabric functions from GetScatterGatherList's
ExecutionRoute (which the docs say will be at DISPATCH_LEVEL).

> This will push the thread context switch into the HCA if 
> required, rather than duplicating this functionality in all clients.

That makes it seem easier for clients, but if the HCA is basically just
passing things to a worker thread at PASSIVE_LEVEL, I'm not sure hiding
it is better. Is the core issue that some of the IB API's must block to
perform their function?

- Jan