[ofa-general] IPoIB kernel Oops -- race condition

Jack Morgenstein jackm at dev.mellanox.co.il
Sun Jun 28 11:18:22 PDT 2009


On Sunday 28 June 2009 19:09, Moni Shoua wrote:
> maybe synchronizing the race with a completion var (like IPoIB does in struct ipoib_path) will help. I think this will work. I can send a patch if you want unless you see this idea doesn't work for this case.
> 
> MoniS
I just looked at the ipoib_path struct.  I assume that you mean the "valid" field.  I do not think this will work here.
We need to wait when doing ifconfig ib0 down before testing the busy flag:

       *** We need to wait at this point in ipoib_mcast_leave() before testing FLAG_BUSY ***
       set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
       mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
                                        &rec, comp_mask, GFP_KERNEL,
                                        ipoib_mcast_join_complete, mcast);
       if (IS_ERR(mcast->mc)) {
               clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
              *** Need to wait until this point -- the above set_bit/clear_bit pair needs to be atomic
               ....
       } else {
              *** Wait until this point in ipoib_mcast_leave() so that the busy flag is set,
                  and mcast->mc is successfully assigned -- i.e., non-null ****
       }

 in ipoib_mcast_leave():
         *** NEED TO WAIT HERE BEFORE CONTINUING (so that BUSY is cleared (mcast->mc is in error),
         *** or BUSY flag is set and mcast->mc is a valid, non-NULL pointer ****
         if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
               ib_sa_free_multicast(mcast->mc);
 
- Jack



More information about the general mailing list