[ofa-general] [PATCH] IB/core - possible bug in handling link down in ib_sa_join_multicast()

Sean Hefty mshefty at ichips.intel.com
Thu Sep 20 17:00:30 PDT 2007


Ralph Campbell wrote:
> I was looking at the code for multicast.c and noticed that
> ib_sa_join_multicast() calls queue_join() which puts the
> request at the front of the group->pending_list.  If this
> is a second request, it seems like it would interfere with
> process_join_error() since group->last_join won't point
> to the member at the head of the pending_list. The sequence
> would thus be:

Thanks.  This does indeed appear to be a bug, which your patch should 
fix.  However, to clarify, the problem is really:

> 1. ib_sa_join_multicast()
>    // puts member1 on head of pending_list and starts work thread
> 2. mcast_work_handler()
>    // calls send_join() which sets group->last_join to member1
> 3. ib_sa_join_multicast()
>    // puts member2 on head of pending_list
> 4. IB_EVENT_PORT_ERR event calls mcast_groups_lost()
>    // sets group->state to MCAST_ERROR

replace 4 above with:

4. Join operation fails with non-zero status.  I.e. the problem is 
related to a failure response from the SA, perhaps due to an invalid 
setting for the multicast group, and not related to a port event.

> 5. join_handler() is called with error status
> 6. process_join_error() fails to process member1 since
>    it doesn't match the first entry in the group->pending_list.

The impact is that the failed join request gets tossed.  The request at 
the head of the pending_list now gets processed.  After it completes, 
the original request (from step 1) ends up trying again.  So, everything 
should eventually work out as expected.

Roland, can we please queue this for 2.6.24?  Would you like it 
resubmitted with an updated patch description?

- Sean

> Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
> 
> diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
> index 15b4c4d..1bc1fe6 100644
> --- a/drivers/infiniband/core/multicast.c
> +++ b/drivers/infiniband/core/multicast.c
> @@ -196,7 +196,7 @@ static void queue_join(struct mcast_member *member)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&group->lock, flags);
> -	list_add(&member->list, &group->pending_list);
> +	list_add_tail(&member->list, &group->pending_list);
>  	if (group->state == MCAST_IDLE) {
>  		group->state = MCAST_BUSY;
>  		atomic_inc(&group->refcount);
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 



More information about the general mailing list