[openib-general] Re: user_mad.c: deadlock?

Roland Dreier rolandd at cisco.com
Wed Nov 9 11:19:53 PST 2005


    Roland> And also we need ib_umad_kill_port() to wait for any
    Roland> in-progress ib_umad_close() calls, since we don't want to
    Roland> call ib_unregister_mad_agent() after we've returned from
    Roland> the device removal call.

    Michael> This should work fine too since the last down_write that
    Michael> detects that list list is empty will flush these guys
    Michael> out.

The problem I run into trying to implement this is that both
ib_umad_close() and ib_umad_kill_port() need to do something like:

	down_write(&port->mutex);
	agent = file->agent[id];
	file->agent[id] = NULL;
	up_write(&port->mutex);

	if (agent)
		ib_unregister_mad_agent(agent);

but ib_umad_close() could pause arbitrarily long right before the
ib_unregister_mad_agent() call and then end up calling the function
after the device is already gone.

 - R.



More information about the general mailing list