[openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port
Michael S. Tsirkin
mst at mellanox.co.il
Thu Oct 12 11:13:32 PDT 2006
Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port
>
> >But what is the issue? some kind of race?
>
> If we look at just the ib_multicast patches as an example...
>
> Calling ib_join_multicast allocates a struct ib_multicast that must be freed.
> Here's the relevant portion of ipoib's join callback:
>
> @@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s
> /* Clear the busy flag so we try again */
> + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY,
> + &mcast->flags);
> }
> + return status;
> }
>
> The callback clears the busy flag, and frees the structure by returning a
> non-zero value from the callback. (This is convenient for error handling.) Let
> the callback thread hang around right at the return statement for a while.
>
> When ipoib is unloaded, one of the calls it makes during cleanup is
> ipoib_mcast_leave(), which does:
>
> if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
> ib_free_multicast(mcast->mc);
>
> If ipoib_mcast_leave() is called at the same time that an error is reported
> through the callback, it's possible that the struct ib_multicast will be freed
> by the callback thread. But there's nothing to prevent the callback thread from
> executing in the ipoib code after unload has occurred.
>
> Similar issues can apply to ib_cm and rdma_cm.
>
> - Sean
>
But unlike the sa races which were unfixable without API changes,
here users can synchronize the removal of the mc object.
So I think what you describe is a user error.
For example SDP does
lock
cm_id = ssk->cm_id
ssk->cm_id = NULL
unlock
if (cm_id)
destroy(cm_id)
and in callback
ssk = context
lock
if (!ssk->cm_id) {
unlock
return 0;
}
...
--
MST
More information about the general
mailing list