[ofw] A crash in WmReceiveHandler

Hefty, Sean sean.hefty at intel.com
Thu Dec 16 08:18:41 PST 2010


> As you can see the crash occurred  because pDevice == NULL.
> 
> It looks like hCa and hQp are in CL_INITIALIZED state, so
> WmRegRemoveHandler or WmRegInit (error flow) probably did not put NULL in
> the pDevice yet.

Are we hitting an error flow?

> Is it possible that in WmRegInit the line:
> 
> pRegistration->pDevice = dev;
> 
> was not called yet, but the line:
> 
> ib_status = dev->IbInterface.reg_mad_svc(pRegistration->hQp, &svc,
> &pRegistration->hService);
> 
> was already executed and a mad was received?

This does look like a problem.  Moving the pDevice assignment up should fix any potential issue there, and I believe it's safe to do so.

I do have a concern that the code crashed here:

	if (reg->hService == NULL) {
		reg->pDevice->IbInterface.put_mad(pMad);	<---

This means that hService was NULL, but winmad still received a callback.  I can understand receiving a callback before reg_mad_svc() returns, but from this call:

	ib_status = dev->IbInterface.reg_mad_svc(pRegistration->hQp, &svc,
								&pRegistration->hService);

hService should be set before any callback is invoked.  Looking at the code for reg_mad_svc, the service parameter is set at the end of the function.  So, it appears that ibal can begin reporting mads to the user before it has finished initializing the mad service, and may do so even in the case where reg_mad_svc fails.

We may need to move the call to __mad_disp_reg() to after the assignment of *ph_mad_svc in reg_mad_svc.  I don't know if it's safe to make that change, however.

- Sean



More information about the ofw mailing list