[Openib-windows] LID change event

Yossi Leybovich sleybo at mellanox.co.il
Tue Jul 4 08:09:53 PDT 2006


Hi

We found more cases that IPoIB discover duplicate LID in its endptlist
(even after we clean the LID list in ipoib_reset_all)
This can be cause from old packets in the network (recv packets create
p_src endpnt if it does not exist and the packet can carry the old LID)
I think that this patch reduce the possibility of getting duplicate
entries in the LID.
It insert to the LIDs list only when the path record query is back (with
the av).

More over same as we create endpt entry in recv_arp (with LID 0 because
source LID may not be the original initiator) we should do that  in
recv_get_endpt function as well and wait to the LID from the path record
query.

I also add assert to check for duplication in the path_record_cb

Another option is:
To check in each insertion to the LIDs list if the LID already exist in
the list , if yes remove the entry from the LIDs list and zero the LID
field of the endpt struct.


Singed-off-by: Yossi Leybovich (slyebo at mellanox.co.il)

Index: ipoib_endpoint.c
===================================================================
--- ipoib_endpoint.c	(revision 1505)
+++ ipoib_endpoint.c	(working copy)
@@ -413,6 +413,8 @@
 	
 	if( !p_endpt->dlid )
 	{
+		cl_map_item_t	*p_qitem;
+
 		/* This is a subnet local endpoint that does not have
its LID set. */
 		p_endpt->dlid = p_path->dlid;
 		/*
@@ -420,8 +422,9 @@
 		 * traffic will resolve it properly.
 		 */
 		cl_obj_lock( &p_port->obj );
-		cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
+		p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
 			p_endpt->dlid, &p_endpt->lid_item );
+		CL_ASSERT( p_qitem == &p_endpt->lid_item );
 		cl_obj_unlock( &p_port->obj );
 	}
 	av_attr.static_rate = ib_path_rec_rate( p_path );
Index: ipoib_port.c
===================================================================
--- ipoib_port.c	(revision 1505)
+++ ipoib_port.c	(working copy)
@@ -1711,7 +1711,7 @@
 #else	/* IPOIB_INLINE_RECV */
 			*pp_src = ipoib_endpt_create(
&p_desc->p_buf->ib.grh.src_gid,
 #endif	/* IPOIB_INLINE_RECV */
-				p_wc->recv.ud.remote_lid,
p_wc->recv.ud.remote_qp );
+				0, p_wc->recv.ud.remote_qp );
 			if( !*pp_src )
 			{
 				IPOIB_PRINT_EXIT( TRACE_LEVEL_ERROR,
IPOIB_DBG_ERROR,






> -----Original Message-----
> From: openib-windows-bounces at openib.org 
> [mailto:openib-windows-bounces at openib.org] On Behalf Of Yossi 
> Leybovich
> Sent: Tuesday, July 04, 2006 10:31 AM
> To: Fab Tillier
> Cc: openib-windows at openib.org
> Subject: [Openib-windows] LID change event
> 
> Fab
>  
> We check IPoIB on small cluster (16 nodes).
> While running with check version we got ASSERT while 
> inserting entry to the lid_endpt list.
> 	
> Line 4301 in ipoib_port:
> 
> 	if( p_endpt->dlid )
> 	{
> 		p_qitem = cl_qmap_insert(
> 			&p_port->endpt_mgr.lid_endpts, 
> p_endpt->dlid, &p_endpt->lid_item );
> 		CL_ASSERT( p_qitem == &p_endpt->lid_item );
> 	}
> 
> (looks like the lid is already exist)
> 
> 
> It seems that IPoIB does not handle the case when new SM 
> change the LID assignments of the nodes.
> I think that solving this problem is to clean the endpt lid 
> list while flushing the avs of the endpts.
> The code will refill the lid field when the path record query is back.
>  
> We tried this patch and its work.
>  
>  
> Index: W:/work/latest/ulp/ipoib/kernel/ipoib_port.c
> ===================================================================
> --- W:/work/latest/ulp/ipoib/kernel/ipoib_port.c (revision 1496)
> +++ W:/work/latest/ulp/ipoib/kernel/ipoib_port.c (working copy)
> @@ -3994,11 +3994,7 @@
>      &p_endpt->mac_item );
>     cl_fmap_remove_item( &p_port->endpt_mgr.gid_endpts,
>      &p_endpt->gid_item );
> -   if( p_endpt->dlid )
> -   {
> -    cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts,
> -     &p_endpt->lid_item );
> -   }
> +
>     cl_qlist_insert_tail(
>      &mc_list, &p_endpt->mac_item.pool_item.list_item );
>    }
> @@ -4008,6 +4004,14 @@
>     p_port->p_adapter->p_ifc->destroy_av( p_endpt->h_av );
>     p_endpt->h_av = NULL;
>    }
> +  
> +  if( p_endpt->dlid )
> +  {
> +   cl_qmap_remove_item( &p_port->endpt_mgr.lid_endpts,
> +    &p_endpt->lid_item );
> +   p_endpt->dlid = 0;
> +  }
> +  
>   }
>   cl_obj_unlock( &p_port->obj );
>  
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_dup_lid.patch
Type: application/octet-stream
Size: 1261 bytes
Desc: ipoib_dup_lid.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060704/c60bc4f7/attachment.obj>


More information about the ofw mailing list