[ofa-general] lock dependency in ib_user_mad
Roland Dreier
rdreier at cisco.com
Thu Dec 20 13:35:17 PST 2007
> I see hangs killing opensm related to a bug in user_mad.c. The problem appears
> to be:
>
> ib_umad_close()
> downgrade_write(&file->port->mutex)
> ib_unregister_mad_agent(...)
> up_read(&file->port->mutex)
>
> ib_unregister_mad_agent() flushes any outstanding MADs, resulting in calls to
> send_handler() and recv_handler(), both of which call queue_packet():
>
> queue_packet()
> down_read(&file->port->mutex)
> ...
> up_read(&file->port->mutex)
This should be fine (and comes from an earlier set of changes to fix
deadlocks): ib_umad_close() does a downgrade_write() before calling
ib_unregister_mad_agent(), so it only holds the mutex with a read
lock, which means that queue_packet() should be able to take another
read lock.
Unless there's something that prevents one thread from taking a read
lock twice? What kernel are you seeing these problems with?
> Does anyone know the reasoning for holding the mutex around
> ib_unregister_mad_agent()?
It's to keep things serialized against a port disappearing because a
device is being removed. But looking at things, I think we can
probably rejigger the locking to make things simpler, and avoid the
use of downgrade_write(), which the -rt people don't like.
- R.
More information about the general
mailing list