[Openib-windows] Crash on driver unload

Leonid Keller leonid at mellanox.co.il
Tue May 23 03:02:47 PDT 2006


Hi, Fab.
I've got a blue screen on post_send operation of a test.
The direct reason was the fact that the driver was at that moment found
at the mthca_cmd_cleanup() line of mthca_remove_one()function, waiting
for a command and having released almost all the resources.
mthca_remove_one() was called from __PowerDownCb().

This rises two questions:

1. (a minor one) Why could driver receive a SET_POWER request ?
The computer was not being powered down. The only action, related to the
crash - and I'm not sure that it was really done - was disabling of
ibbus.sys driver. 
Could it somehow cause sending of a SET_POWER request ?

2. (a major one) Why do we have the crash ?
Look, please, into __PowerDownCb() (the same happens also in
hca_release_resources()):
I, first, call p_ext->ci_ifc.deregister_ca() and then immediately start
a synchronous releasing of the driver resources.
But as far as I saw, the call to deregister_ca() just starts an
asynchronous process of closing CA instance, which doesn't prevent  - at
least for some time - the work of kernel clients.
I also saw a following comment in the ib_deregister_ca().
	 /* TODO: Before destroying, do a query PnP call and return
IB_BUSY as needed. */

We need to do here several things, IMO:
	- deregister_ca() has, before starting the closing of CA, to set
some flag, preventing any further work with the driver;
	- this flag must be checked in all the verbs;
	- the process of CA closing, I believe, requires using of the
driver resources, so the driver has to wait for the completion of the
process before it starts the releasing of its own resources.
	- decide on case when deregister_ca() would like to return
IB_BUSY. What is to be the behaviour of the driver in that case ?
	
Thoughts ?



More information about the ofw mailing list