[ofa-general] Re: [ewg] Re: [PATCH] IB/ipoib: set neigh->dgid upon ipoib_neigh creation

Moni Shoua monis at Voltaire.COM
Sun Mar 29 02:07:09 PDT 2009


Eli Cohen wrote:
> On Wed, Mar 25, 2009 at 09:20:19PM +0200, Yossi Etigin wrote:
>> Wouldn't this patch break the fast bond+SM failover recovery? 
>>
>> Up till now, in such case neigh->dgid was the old gid until path record  
>> query was successful, and that caused to retry the path query until it's  
>> successful. With that patch, a new path query will not be issued until  
>> the next arp refresh because neigh->dgid will be set to a valid value  
>> right after the first packet.
> Looks to me like 
> 
>> How about keeping the memcpy in path_rec-completion, and also adding it  
>> to ib_mcast_send (where ipoib neigh is created)?
>>
> I did not like the idea that a unicast ipoib_neigh can be created and
> destroyed till the path is resolved so I preferred to initialize at
> creation time. How about the following patch to solve the problem of
> failure to resolve a path:
> 
> 
>>From 4e6e3dd0186803f7547f5e4c233d7525dfcdaec6 Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli at mellanox.co.il>
> Date: Wed, 25 Mar 2009 16:22:28 +0200
> Subject: [PATCH] IB/ipoib: set neigh->dgid upon ipoib_neigh creation
> 
> If we fail to do that, there can be superfluous calls to
> ipoib_neigh_free()/ipoib_neigh_alloc() due to the following if statement in
> ipoib_start_xmit():
> 
> 	if (unlikely((memcmp(&neigh->dgid.raw,
> 			     skb->dst->neighbour->ha + 4,
> 			     sizeof(union ib_gid))) ||
> 
> In the case of a unicast GID, it can happen until the path is resolved and is
> not so severe. In the case of a multicast address, neigh->dgid is never even
> updated, so the corresponding ipoib_neigh will be destroyed and created on
> every multicast packet sent.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 0bd2a4f..b680483 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -460,8 +460,6 @@ static void path_rec_completion(int status,
>  			}
>  			kref_get(&path->ah->ref);
>  			neigh->ah = path->ah;
> -			memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
> -			       sizeof(union ib_gid));
>  
>  			if (ipoib_cm_enabled(dev, neigh->neighbour)) {
>  				if (!ipoib_cm_get(neigh))
> @@ -481,7 +479,9 @@ static void path_rec_completion(int status,
>  				__skb_queue_tail(&skqueue, skb);
>  		}
>  		path->valid = 1;
> -	}
> +	} else
> +		list_for_each_entry_safe(neigh, tn, &path->neigh_list, list)
> +			memset(&neigh->dgid, 0, sizeof neigh->dgid);
>  
>  	path->query = NULL;
>  	complete(&path->done);
> @@ -878,6 +878,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
>  	neigh->neighbour = neighbour;
>  	neigh->dev = dev;
>  	*to_ipoib_neigh(neighbour) = neigh;
> +	memcpy(neigh->dgid.raw, neighbour->ha + 4, 16);
>  	skb_queue_head_init(&neigh->queue);
>  	ipoib_cm_set(neigh, NULL);
>  

This might work but still, I don't understand what this patch fixes. What was the initial problem? 
I agree with Yossi's early comments that the patch breaks the fast fail over we've worked on in the past.



More information about the ewg mailing list