[ofw] RE: HPC head-node slow down when OpenSM is started on the head-node (RC4, svn.1691).

Tzachi Dar tzachid at mellanox.co.il
Wed Oct 29 09:01:01 PDT 2008


Hi Stan,

In very short, please apply the attached log and tell me if it works.

If you want the short version please start reading from (conclusions)
==> look in the mail

Here is the long story:
==========================


Here is a small analyze of your log:

3115 port_up
4196 port_up
7713 port_up

Since I don't see any port_down event I guess that this explains why we
have never moved the qp to error.

Now, let's look at what pnp events we got:

grep -n "__ipoib_pnp_cb():" q:\temp\head-node-slowdown-log.TXT

62:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x1002
(IB_PNP_PORT_ADD) object state IB_PNP_PORT_ADD
63:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x1002
(IB_PNP_PORT_ADD)
1112:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x1002
(IB_PNP_PORT_ADD) object state IB_PNP_PORT_ADD
1115:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN) object state IB_PNP_PORT_ADD
1116:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN)
1117:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x40002
(IB_PNP_PORT_DOWN) object state IB_PNP_PORT_DOWN
1120:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x80000000
(IB_PNP_REG_COMPLETE) object state IB_PNP_PORT_DOWN
1121:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x80000000
(IB_PNP_REG_COMPLETE)
1124:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x80000000
(IB_PNP_REG_COMPLETE) object state IB_PNP_PORT_DOWN
1233:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x8002
(IB_PNP_PORT_INIT) object state IB_PNP_PORT_DOWN
1234:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x8002
(IB_PNP_PORT_INIT)
1235:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x8002
(IB_PNP_PORT_INIT) object state IB_PNP_PORT_DOWN
3113:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x400002
(IB_PNP_LID_CHANGE) object state IB_PNP_PORT_DOWN
3114:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x400002
(IB_PNP_LID_CHANGE)
3135:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x400002
(IB_PNP_LID_CHANGE) object state IB_PNP_PORT_DOWN
4194:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE) object state IB_PNP_PORT_DOWN
4195:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE)
4207:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE) object state IB_PNP_PORT_INIT
6067:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x800002
(IB_PNP_SUBNET_TIMEOUT_CHANGE) object state IB_PNP_PORT_ACTIVE
6068:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x800002
(IB_PNP_SUBNET_TIMEOUT_CHANGE)
6069:[IPoIB]:__ipoib_pnp_cb(): IPOIB: Received unhandled PnP event
0x800002 (IB_PNP_SUBNET_TIMEOUT_CHANGE)
6070:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x800002
(IB_PNP_SUBNET_TIMEOUT_CHANGE) object state IB_PNP_PORT_ACTIVE
7710:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE) object state IB_PNP_PORT_ACTIVE
7711:[IPoIB]:__ipoib_pnp_cb(): p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE)

Here are my understanding from the log:
1) for each event we print 3 times (beginning, "middle", and end);
2) The last one (port actice) is only printed twice => We never reach
the end.
3) the IB_PNP_SUBNET_TIMEOUT_CHANGE is actually some error flow (that we
probably don't handle well).


Some more info:
The first port_up ended very soon with invalid guid. (probably good).

The second port_up was derived from sm_change notification.
Every thing goes well, the port_up returns, and a __bcast_get was
schedulad. (the adapter state has changed to IB_PNP_PORT_INIT).
__bcast_get_cb is called and we schedule a bc_join
__bcast_cb() is called everything works well, and the port is now up
!!!!!!!! - good (5315) 


After lines 5315 until the 6067 (IB_PNP_SUBNET_TIMEOUT_CHANGE) we don't
do anything special (that is we receive and send)
For the IB_PNP_SUBNET_TIMEOUT_CHANGE we do nothing !!!! (blamed for no
reason ?????)

On line 7710 we get the actice event. Until that time all was fine
!!!!!!!!!!

For the active, we call port_up and we are stacked!!!!!


Here is another log (from my machine) - that worked well !!!!!

00000002	0.20333815	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x1002 (IB_PNP_PORT_ADD) object state
IB_PNP_PORT_ADD	
00000003	0.26569372	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x1002 (IB_PNP_PORT_ADD) object state
IB_PNP_PORT_ADD	
00000004	0.32821512	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x40002 (IB_PNP_PORT_DOWN) object state
IB_PNP_PORT_ADD	
00000005	0.39069474	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x40002 (IB_PNP_PORT_DOWN) object state
IB_PNP_PORT_DOWN	
00000006	0.45350561	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x80000000 (IB_PNP_REG_COMPLETE) object state
IB_PNP_PORT_DOWN	
00000007	0.51572257	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x80000000 (IB_PNP_REG_COMPLETE) object state
IB_PNP_PORT_DOWN	
00000008	8.39064217	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x8002 (IB_PNP_PORT_INIT) object state
IB_PNP_PORT_DOWN	
00000009	8.45311928	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x8002 (IB_PNP_PORT_INIT) object state
IB_PNP_PORT_DOWN	
00000010	8.51558018	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x400002 (IB_PNP_LID_CHANGE) object state
IB_PNP_PORT_DOWN	
00000011	8.60924816	ipoib_port_up
called[IPoIB]:__port_get_bcast() !ERROR!: ib_query returned
IB_INVALID_GUID	
00000012	8.67163849	[IPoIB]:ipoib_port_up() !ERROR!:
__port_get_bcast returned IB_INVALID_GUID	
00000013	8.73410130	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x400002 (IB_PNP_LID_CHANGE) object state
IB_PNP_PORT_DOWN	
00000014	8.79652596	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x100002 (IB_PNP_SM_CHANGE) object state
IB_PNP_PORT_DOWN	
00000015	8.89036465	ipoib_port_up called
[IPoIB]:__ipoib_pnp_cb(): exiting p_pnp_rec->pnp_event = 0x100002
(IB_PNP_SM_CHANGE) object state IB_PNP_PORT_INIT	
00000016	8.95284176	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x800002 (IB_PNP_SUBNET_TIMEOUT_CHANGE) object
state IB_PNP_PORT_INIT	
00000017	9.01540661	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x800002 (IB_PNP_SUBNET_TIMEOUT_CHANGE) object
state IB_PNP_PORT_INIT	
00000018	9.07785797	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x10002 (IB_PNP_PORT_ARMED) object state
IB_PNP_PORT_INIT	
00000019	9.14059353	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x10002 (IB_PNP_PORT_ARMED) object state
IB_PNP_PORT_INIT	
00000020	9.20303917	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x20002 (IB_PNP_PORT_ACTIVE) object state
IB_PNP_PORT_INIT	
00000021	9.29688072	ipoib_port_up called
[IPoIB]:__ipoib_pnp_cb(): exiting p_pnp_rec->pnp_event = 0x20002
(IB_PNP_PORT_ACTIVE) object state IB_PNP_PORT_INIT	
00000022	9.35899925	[IPoIB]:__endpt_mgr_add_local():
<__endpt_mgr_add_local>:  av_attr.dlid = p_port_info->base_lid = 256	

It seems that in this log we had 2 ipoib_port_up without a down and
things worked well ????????????

So I added some more printings and got to the following log:

00000000	0.00000000	[AL]al_dev_close() !ERROR!: Client
closed with a null open context .	
00000001	0.20301315	[IPoIB]:ipoib_get_adapter_params(): Read
configuration.Registry GUIDMask value not found, setting default value=
0x0	
00000002	0.26544237	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x1002 (IB_PNP_PORT_ADD) object state
IB_PNP_PORT_ADD	
00000003	0.32791898	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x1002 (IB_PNP_PORT_ADD) object state
IB_PNP_PORT_ADD	
00000004	0.39061481	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x40002 (IB_PNP_PORT_DOWN) object state
IB_PNP_PORT_ADD	
00000005	0.45292121	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x40002 (IB_PNP_PORT_DOWN) object state
IB_PNP_PORT_DOWN	
00000006	0.51541978	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x80000000 (IB_PNP_REG_COMPLETE) object state
IB_PNP_PORT_DOWN	
00000007	0.57790291	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x80000000 (IB_PNP_REG_COMPLETE) object state
IB_PNP_PORT_DOWN	
00000008	6.39010382	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x8002 (IB_PNP_PORT_INIT) object state
IB_PNP_PORT_DOWN	
00000009	6.45256996	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x8002 (IB_PNP_PORT_INIT) object state
IB_PNP_PORT_DOWN	
00000010	6.51511383	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x400002 (IB_PNP_LID_CHANGE) object state
IB_PNP_PORT_DOWN	
00000011	6.54631376	ipoib_port_up called 	
00000012	6.60886145	[IPoIB]:__port_get_bcast() !ERROR!:
ib_query returned IB_INVALID_GUID	
00000013	6.67132616	[IPoIB]:ipoib_port_up() !ERROR!:
__port_get_bcast returned IB_INVALID_GUID	
00000014	6.73394728	[IPoIB]:ipoib_port_up() !ERROR!: Exiting
status = 28	
00000015	6.79705715	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x400002 (IB_PNP_LID_CHANGE) object state
IB_PNP_PORT_DOWN	
00000016	6.85884714	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x100002 (IB_PNP_SM_CHANGE) object state
IB_PNP_PORT_DOWN	
00000017	6.89006758	ipoib_port_up called 	
00000018	6.95262957	[IPoIB]:ipoib_port_up() !ERROR!: Exiting
status = 0	
00000019	7.01502085	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x100002 (IB_PNP_SM_CHANGE) object state
IB_PNP_PORT_INIT	
00000020	7.07754707	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x800002 (IB_PNP_SUBNET_TIMEOUT_CHANGE) object
state IB_PNP_PORT_INIT	
00000021	7.14003515	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x800002 (IB_PNP_SUBNET_TIMEOUT_CHANGE) object
state IB_PNP_PORT_INIT	
00000022	7.20254660	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x10002 (IB_PNP_PORT_ARMED) object state
IB_PNP_PORT_INIT	
00000023	7.26509857	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x10002 (IB_PNP_PORT_ARMED) object state
IB_PNP_PORT_INIT	
00000024	7.32752895	[IPoIB]:__ipoib_pnp_cb(): entering
p_pnp_rec->pnp_event = 0x20002 (IB_PNP_PORT_ACTIVE) object state
IB_PNP_PORT_INIT	
00000025	7.35878468	ipoib_port_up called 	
00000026	7.42124557	[IPoIB]:ipoib_port_up() !ERROR!: Exiting
status = 0	
00000027	7.48375750	[IPoIB]:__ipoib_pnp_cb(): exiting
p_pnp_rec->pnp_event = 0x20002 (IB_PNP_PORT_ACTIVE) object state
IB_PNP_PORT_INIT	
00000028	7.54626369	[IPoIB]:__bcast_get_cb() !ERROR!: status
of request IB_SUCCESS	
00000029	7.60889435	[IPoIB]:__endpt_mgr_add_local():
<__endpt_mgr_add_local>:  av_attr.dlid = p_port_info->base_lid = 256	
00000030	7.64053488	Port is up !!!!!!!!	
00000031	7.98372602	[IPoIB]:__bcast_get_cb() !ERROR!:
port_up is canceled because port->state2 8	

>From this logs I have came to a the following conclusions:

1) The pnp mechanism can call port_up a few times, even without calling
port_down.
2) There are two cases (or actualy races) that can cause us to pass this
without getting stacked:
A - the two port up happen very fast, so we don't have the time to stop
the port.
B - once the function __bcast_get_cb() the port state is alrady changed
(port is up, so we silently stop the executaion of the port_up).

In order not to make a total revolution in the code, I have came to a
conclusion that the best fix is to make sure that only one instance of
port_up can be running at the same time.

So here is the patch that is fixing the problem (as I see it).

Comments please  

Thanks
Tzachi

Index: ulp/ipoib/kernel/ipoib_port.c
===================================================================
--- ulp/ipoib/kernel/ipoib_port.c	(revision 1708)
+++ ulp/ipoib/kernel/ipoib_port.c	(working copy)
@@ -5136,14 +5136,24 @@
 
 	IPOIB_ENTER( IPOIB_DBG_INIT );
 
+	cl_obj_lock( &p_port->obj );
+	if ((p_port->state == IB_QPS_INIT) ||
+		(p_port->state == IB_QPS_RTS)){
+		cl_obj_unlock( &p_port->obj );
+		status = STATUS_SUCCESS;
+		IPOIB_PRINT_EXIT( TRACE_LEVEL_INFORMATION,
IPOIB_DBG_INIT,
+			("p_port->state = %d - Aborting.\n",
p_port->state) );        
+		goto up_done;
+	}
+	p_port->state = IB_QPS_INIT;
+	cl_obj_unlock( &p_port->obj );  
+
+
 	/* Wait for all work requests to get flushed. */
 	while( p_port->recv_mgr.depth || p_port->send_mgr.depth )
 		cl_thread_suspend( 0 );
 
-	cl_obj_lock( &p_port->obj );
-	p_port->state = IB_QPS_INIT;
 	KeResetEvent( &p_port->sa_event );
-	cl_obj_unlock( &p_port->obj );
 
 	mad_out = (ib_mad_t*)cl_zalloc(256);
 	if(! mad_out)
@@ -5204,6 +5214,8 @@
 			__endpt_mgr_reset_all( p_port );
 		}
 		KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
+		ASSERT(p_port->state == IB_QPS_INIT);
+		p_port->state = IB_QPS_ERROR;
 	}
 
 	if(mad_out)
@@ -5344,14 +5356,14 @@
 
 	cl_obj_lock( &p_port->obj );
 	p_port->ib_mgr.h_query = NULL;
-	if( p_port->state != IB_QPS_INIT )
-	{
-		status = IB_CANCELED;
-		goto done;
-	}
 
+	CL_ASSERT(p_port->state == IB_QPS_INIT);
 	status = p_query_rec->status;
 
+	IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
+		("status of request %s\n", 
+		p_port->p_adapter->p_ifc->get_err_str( status )) );
+
 	switch( status )
 	{
 	case IB_SUCCESS:
@@ -5381,7 +5393,6 @@
 			EVENT_IPOIB_BCAST_GET, 1, p_query_rec->status );
 	}
 
-done:
 	cl_obj_unlock( &p_port->obj );
 
 	if( status != IB_SUCCESS )
@@ -5621,22 +5632,8 @@
 	p_port = (ipoib_port_t*)p_mcast_rec->mcast_context;
 
 	cl_obj_lock( &p_port->obj );
-	if( p_port->state != IB_QPS_INIT )
-	{
-		cl_obj_unlock( &p_port->obj );
-		if( p_mcast_rec->status == IB_SUCCESS )
 
-		{
-			ipoib_port_ref(p_port, ref_leave_mcast);
-			p_port->p_adapter->p_ifc->leave_mcast(
p_mcast_rec->h_mcast, __leave_error_mcast_cb );
-		}
-		KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
-		ipoib_port_deref( p_port, ref_bcast_inv_state );
-		IPOIB_PRINT_EXIT( TRACE_LEVEL_INFORMATION,
IPOIB_DBG_INIT,
-			("Invalid state - Aborting.\n") );
-		return;
-	}
-
+	ASSERT(p_port->state == IB_QPS_INIT);
 	status = p_mcast_rec->status;
 
 	if( status != IB_SUCCESS )
@@ -5685,6 +5682,8 @@
 			ipoib_set_inactive( p_port->p_adapter );
 			__endpt_mgr_reset_all( p_port );
 			KeSetEvent( &p_port->sa_event, EVENT_INCREMENT,
FALSE );
+			ASSERT(p_port->state == IB_QPS_INIT);
+			p_port->state = IB_QPS_ERROR;
 		}
 		ipoib_port_deref( p_port, ref_bcast_req_failed );
 		IPOIB_EXIT( IPOIB_DBG_INIT );
@@ -5729,6 +5728,8 @@
 err:
 		/* Flag the adapter as hung. */
 		p_port->p_adapter->hung = TRUE;
+		ASSERT(p_port->state == IB_QPS_INIT);
+		p_port->state = IB_QPS_ERROR;        
 		KeSetEvent( &p_port->sa_event, EVENT_INCREMENT, FALSE );
 		ipoib_port_deref( p_port, ref_bcast_error );
 		IPOIB_EXIT( IPOIB_DBG_INIT );
@@ -5737,8 +5738,8 @@
 
 	cl_obj_lock( &p_port->obj );
 	/* Only change the state if we're still in INIT. */
-	if( p_port->state == IB_QPS_INIT )
-		p_port->state = IB_QPS_RTS;
+	ASSERT( p_port->state == IB_QPS_INIT );
+	p_port->state = IB_QPS_RTS;
 	cl_obj_unlock( &p_port->obj );
 
 	/* Prepost receives. */


> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com] 
> Sent: Wednesday, October 29, 2008 2:16 AM
> To: Tzachi Dar; Leonid Keller; Fab Tillier
> Cc: Ishai Rabinovitz; ofw at lists.openfabrics.org
> Subject: RE: HPC head-node slow down when OpenSM is started 
> on the head-node (RC4,svn.1691).
> 
> Here's a 1st take at a log dump. Sadly I goofed and switched 
> DebugFlags & DebugLevel settings, so there is 'extra' noise. 
> Will cleanup prints, adding port_down() calls, correct Debug 
> settings and redo the data capture.
> 
> Thought you might want to see the raw data anyway.
> 
> 
> Tzachi Dar wrote:
> > Hi,
> >
> > This is a bug that can come from a few reasons (see bellow) and we 
> > have been able to reproduce it here although we are still having 
> > different issues. If you can find out how to reproduce it without 
> > another SM that could be great.
> 
> On an installed OpenIB system (Ibcore+IPoIB) Start OpenSM on 
> the head-node of an HPC cluster. Set IPoIB IPv4 address and 
> wait a few minutes, you will see task manager CPU performance 
> hover around 25% utilization. The head-node is the only node 
> on the fabric.
> 
> Stan.
> 
> >
> > As to your problem: Generally speaking the code is stacked at 
> > ipoib_port_up. There is an assumption that ipoib_port_up is called 
> > after ipoib_port_down. 99% that you have found a flow in 
> which this is 
> > not the case. On port down we close the QPs so that all 
> packets that 
> > have been placed for receive are freed.
> >
> > Taking one step up, it seems that we are having a problem 
> in the plug 
> > and play mechanism (IBAL). In general you are being called from the 
> > function __ipoib_pnp_cb. This function is quite complicated, as it 
> > takes into account the current state of ipoib, and also the 
> different 
> > events.
> >
> > In order for me to have more information, pleases send me a 
> log with 
> > printing at the following places:
> > Please add their the following print at __ipoib_pnp_cb (As soon as 
> > possibale):
> >       IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_PNP,
> >               ("p_pnp_rec->pnp_event = 0x%x (%s) object state %s\n",
> >               p_pnp_rec->pnp_event, ib_get_pnp_event_str( 
> > p_pnp_rec->pnp_event ), ib_get_pnp_event_str(Adapter->state)) );
> >
> > On the exit from this function please also print the same 
> line (I want 
> > to see the state changes).
> >
> > Please add a printing in ipoib_port_up, ipoib_port_down ,
> >
> > And send us the log. I hope that I'll be able to figure 
> what is going 
> > on there.
> >
> > And again a simple repro will help (probably even more).
> >
> > Thanks
> > Tzachi
> >
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib_port_up_race.patch
Type: application/octet-stream
Size: 3585 bytes
Desc: ipoib_port_up_race.patch
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20081029/4074e667/attachment.obj>


More information about the ofw mailing list