[ofw] IPoIB crashes with SRP bad shutdown?

Leonid Keller leonid at mellanox.co.il
Thu Feb 14 04:07:36 PST 2008


> Has anyone had experience with IPoIB dying when an SRP initiator is
not manually ejected before the target is shutdown.
> How is IPoIB related to SRP Initiator? Why would one crash the other?
 
Here is a possible answer and  a patch from Yossi Leibovich.
 
While working Windows SRP with Linux SRPT (OFED) I discover that after
removing the SRPT the windows side stop receiving MADs.
The problem is that the SRP, while handling the DREQ callback of the
target, tries to reconnect and waits indefinitly (in the context of the
callback thread) for the end of the connect operation. The latter can
happen only when SRP will get or a timeout or a REP MAD, which can't be
gotten before returning from the DREQ callback.
This is a deadlock, which prevents any normal MAD handling and can
possibly explain IPoIB incorrect behaviour..
 
 
This patch removes the reconnect code from the DREQ callback.
Any comments are welcome.
 
Index: srp_connection.c
===================================================================
--- srp_connection.c (revision 2166)
+++ srp_connection.c (working copy)
@@ -285,8 +285,8 @@
  ib_cm_drep_t    cm_drep;
  ib_api_status_t status;
  int             i;
- int             retry_count = 0;
 
+
  SRP_ENTER( SRP_DBG_PNP );
 
  SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG, @@ -334,75 +334,9
@@
  SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
   ("Session Object ref_cnt = %d\n", p_srp_session->obj.ref_cnt) );
  cl_obj_destroy( &p_srp_session->obj );
-
- do
- {
-  retry_count++;
-
-  SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
-   ("Attempting to reconnect %s. Connection Attempt Count = %d.\n",
-    p_hba->ioc_info.profile.id_string,
-    retry_count) );
-
-  SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
-   ("Creating New Session For Service Entry Index %d.\n",
-   p_hba->ioc_info.profile.num_svc_entries));
-  p_srp_session = srp_new_session(
-   p_hba, &p_hba->p_svc_entries[i], &status );
-  if ( p_srp_session == NULL )
-  {
-   status = IB_INSUFFICIENT_MEMORY;
-   break;
-  }
-
-  SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
-   ("New Session For Service Entry Index %d Created.\n",
-   p_hba->ioc_info.profile.num_svc_entries));
-  SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
-   ("Logging Into Session.\n"));
-  status = srp_session_login( p_srp_session );
-  if ( status == IB_SUCCESS )
-  {
-   if ( p_hba->max_sg >
p_srp_session->connection.max_scatter_gather_entries )
-   {
-    p_hba->max_sg =
p_srp_session->connection.max_scatter_gather_entries;
-   }
-
-   if ( p_hba->max_srb_ext_sz >
p_srp_session->connection.init_to_targ_iu_sz )
-   {
-    p_hba->max_srb_ext_sz =
-     sizeof( srp_send_descriptor_t ) -
-     SRP_MAX_IU_SIZE +
-     p_srp_session->connection.init_to_targ_iu_sz;
-   }
-
-   cl_obj_lock( &p_hba->obj );
-   p_hba->session_list[i] = p_srp_session;
-   cl_obj_unlock( &p_hba->obj );
-
-   SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
-    ("Session Login Issued Successfully.\n"));
-  }
-  else
-  {
-   SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
-    ("Session Login Failure Status = %d.\n", status));
-   SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
-     ("Session Object ref_cnt = %d\n", p_srp_session->obj.ref_cnt) );
-   cl_obj_destroy( &p_srp_session->obj );
-  }
- } while ( (status != IB_SUCCESS) && (retry_count < 3) );
-
- if ( status == IB_SUCCESS )
- {
-  SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
-    ("Resuming Adapter for %s.\n", p_hba->ioc_info.profile.id_string)
);
-  p_hba->adapter_paused = FALSE;
-  StorPortReady( p_hba->p_ext );
-//  StorPortNotification( BusChangeDetected, p_hba->p_ext, 0 );
- }
-
+ 
  SRP_EXIT( SRP_DBG_PNP );
+ return ;
 }
 
 /* __srp_cm_reply_cb */
 
 
 
 


________________________________

	From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sufficool,
Stanley
	Sent: Sunday, December 30, 2007 12:37 AM
	To: ofw at lists.openfabrics.org
	Subject: [ofw] IPoIB crashes with SRP bad shutdown?
	
	
	Has anyone had experience with IPoIB dying when an SRP initiator
is not manually ejected before the target is shutdown.
	 
	I seem to be able to reproduce this condition regularly with the
OFED SRP Target and WinOF SRP initiator. 
	 
	How is IPoIB related to SRP Initiator? Why would one crash the
other?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080214/25d172c7/attachment.html>


More information about the ofw mailing list