[ofw] RE: [PATCH] SRP dreq deadlock
Fab Tillier
ftillier at windows.microsoft.com
Thu Feb 14 07:58:05 PST 2008
Will the SRP driver ever try to reconnect? It seems that it won't unless the IOC is first detected as removed from the fabric and then discovered again, or an admin manually disables and re-enables the SRP device.
While blocking the CM callback thread is surely a problem (I'll spare everyone the rant about making things asynchronous blah blah blah) the solution should go a bit further than just commenting out the problematic code and fix it so that the intent is preserved (unless the intent itself is incorrect).
-Fab
-----Original Message-----
From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Yossi Leybovich
Sent: Thursday, February 14, 2008 2:58 AM
To: Leonid Keller; ofw at lists.openfabrics.org
Subject: [ofw] [PATCH] SRP dreq deadlock
Leonid
While working Windows SRP with Linux SRPT (OFED) I discover that after removing the SRPT the windows side stop receiving MADs.
The problem is that the SRP while handling the dreq callback of the target try to reconnect and wait (in the context of the callback thread) to the connect operation to end.
This is deadlock as the operation will not finish till SRP release the callback thread.
This patch remove the reconnect code from the dreq callback.
Thanks
Yossi
Index: srp_connection.c
===================================================================
--- srp_connection.c (revision 2166)
+++ srp_connection.c (working copy)
@@ -285,8 +285,8 @@
ib_cm_drep_t cm_drep;
ib_api_status_t status;
int i;
- int retry_count = 0;
+
SRP_ENTER( SRP_DBG_PNP );
SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG, @@ -334,75 +334,9 @@
SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
("Session Object ref_cnt = %d\n",
p_srp_session->obj.ref_cnt) );
cl_obj_destroy( &p_srp_session->obj );
-
- do
- {
- retry_count++;
-
- SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
- ("Attempting to reconnect %s. Connection Attempt
Count = %d.\n",
- p_hba->ioc_info.profile.id_string,
- retry_count) );
-
- SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
- ("Creating New Session For Service Entry Index
%d.\n",
- p_hba->ioc_info.profile.num_svc_entries));
- p_srp_session = srp_new_session(
- p_hba, &p_hba->p_svc_entries[i], &status );
- if ( p_srp_session == NULL )
- {
- status = IB_INSUFFICIENT_MEMORY;
- break;
- }
-
- SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
- ("New Session For Service Entry Index %d
Created.\n",
- p_hba->ioc_info.profile.num_svc_entries));
- SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
- ("Logging Into Session.\n"));
- status = srp_session_login( p_srp_session );
- if ( status == IB_SUCCESS )
- {
- if ( p_hba->max_sg >
p_srp_session->connection.max_scatter_gather_entries )
- {
- p_hba->max_sg =
p_srp_session->connection.max_scatter_gather_entries;
- }
-
- if ( p_hba->max_srb_ext_sz >
p_srp_session->connection.init_to_targ_iu_sz )
- {
- p_hba->max_srb_ext_sz =
- sizeof( srp_send_descriptor_t )
-
- SRP_MAX_IU_SIZE +
-
p_srp_session->connection.init_to_targ_iu_sz;
- }
-
- cl_obj_lock( &p_hba->obj );
- p_hba->session_list[i] = p_srp_session;
- cl_obj_unlock( &p_hba->obj );
-
- SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
- ("Session Login Issued
Successfully.\n"));
- }
- else
- {
- SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
- ("Session Login Failure Status = %d.\n",
status));
- SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
- ("Session Object ref_cnt = %d\n",
p_srp_session->obj.ref_cnt) );
- cl_obj_destroy( &p_srp_session->obj );
- }
- } while ( (status != IB_SUCCESS) && (retry_count < 3) );
-
- if ( status == IB_SUCCESS )
- {
- SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
- ("Resuming Adapter for %s.\n",
p_hba->ioc_info.profile.id_string) );
- p_hba->adapter_paused = FALSE;
- StorPortReady( p_hba->p_ext );
-// StorPortNotification( BusChangeDetected, p_hba->p_ext, 0
);
- }
-
+
SRP_EXIT( SRP_DBG_PNP );
+ return ;
}
/* __srp_cm_reply_cb */
More information about the ofw
mailing list