[ofa-general] lock dependency in ib_user_mad

Roland Dreier rdreier at cisco.com
Thu Dec 20 13:35:17 PST 2007


 > I see hangs killing opensm related to a bug in user_mad.c.  The problem appears
 > to be:
 > 
 > ib_umad_close()
 > 	downgrade_write(&file->port->mutex)
 > 	ib_unregister_mad_agent(...)
 > 	up_read(&file->port->mutex)
 > 
 > ib_unregister_mad_agent() flushes any outstanding MADs, resulting in calls to
 > send_handler() and recv_handler(), both of which call queue_packet():
 > 
 > queue_packet()
 > 	down_read(&file->port->mutex)
 > 	...
 > 	up_read(&file->port->mutex)

This should be fine (and comes from an earlier set of changes to fix
deadlocks): ib_umad_close() does a downgrade_write() before calling
ib_unregister_mad_agent(), so it only holds the mutex with a read
lock, which means that queue_packet() should be able to take another
read lock.

Unless there's something that prevents one thread from taking a read
lock twice?  What kernel are you seeing these problems with?

 > Does anyone know the reasoning for holding the mutex around
 > ib_unregister_mad_agent()?

It's to keep things serialized against a port disappearing because a
device is being removed.  But looking at things, I think we can
probably rejigger the locking to make things simpler, and avoid the
use of downgrade_write(), which the -rt people don't like.

 - R.



More information about the general mailing list