[openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port

Sean Hefty sean.hefty at intel.com
Thu Oct 12 10:52:11 PDT 2006


>But what is the issue? some kind of race?

If we look at just the ib_multicast patches as an example...

Calling ib_join_multicast allocates a struct ib_multicast that must be freed.
Here's the relevant portion of ipoib's join callback:

@@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s
 		/* Clear the busy flag so we try again */
+		status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY,
+					    &mcast->flags);
 	}
+	return status;
 }

The callback clears the busy flag, and frees the structure by returning a
non-zero value from the callback.  (This is convenient for error handling.)  Let
the callback thread hang around right at the return statement for a while.

When ipoib is unloaded, one of the calls it makes during cleanup is
ipoib_mcast_leave(), which does:

	if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
		ib_free_multicast(mcast->mc);

If ipoib_mcast_leave() is called at the same time that an error is reported
through the callback, it's possible that the struct ib_multicast will be freed
by the callback thread.  But there's nothing to prevent the callback thread from
executing in the ipoib code after unload has occurred.

Similar issues can apply to ib_cm and rdma_cm.

- Sean




More information about the general mailing list