[ofw] RE: [PATCH] SRP dreq deadlock
Alex Estrin
alex.estrin at qlogic.com
Thu Feb 14 08:29:16 PST 2008
Fab,
I've implemented session recovery/reconnect attempts
offloaded to a separate thread.
I'm planning to finish and submit the patch for review next week.
Thanks,
Alex.
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> Sent: Thursday, February 14, 2008 10:58 AM
> To: Yossi Leybovich; Leonid Keller; ofw at lists.openfabrics.org
> Subject: [ofw] RE: [PATCH] SRP dreq deadlock
>
> Will the SRP driver ever try to reconnect? It seems that it won't
unless
> the IOC is first detected as removed from the fabric and then
discovered
> again, or an admin manually disables and re-enables the SRP device.
>
> While blocking the CM callback thread is surely a problem (I'll spare
> everyone the rant about making things asynchronous blah blah blah) the
> solution should go a bit further than just commenting out the
problematic
> code and fix it so that the intent is preserved (unless the intent
itself
> is incorrect).
>
> -Fab
>
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> bounces at lists.openfabrics.org] On Behalf Of Yossi Leybovich
> Sent: Thursday, February 14, 2008 2:58 AM
> To: Leonid Keller; ofw at lists.openfabrics.org
> Subject: [ofw] [PATCH] SRP dreq deadlock
>
> Leonid
>
> While working Windows SRP with Linux SRPT (OFED) I discover that after
> removing the SRPT the windows side stop receiving MADs.
> The problem is that the SRP while handling the dreq callback of the
target
> try to reconnect and wait (in the context of the callback thread) to
the
> connect operation to end.
> This is deadlock as the operation will not finish till SRP release the
> callback thread.
>
> This patch remove the reconnect code from the dreq callback.
>
> Thanks
> Yossi
> Index: srp_connection.c
> ===================================================================
> --- srp_connection.c (revision 2166)
> +++ srp_connection.c (working copy)
> @@ -285,8 +285,8 @@
> ib_cm_drep_t cm_drep;
> ib_api_status_t status;
> int i;
> - int retry_count = 0;
>
> +
> SRP_ENTER( SRP_DBG_PNP );
>
> SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG, @@ -334,75
> +334,9 @@
> SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> ("Session Object ref_cnt = %d\n",
> p_srp_session->obj.ref_cnt) );
> cl_obj_destroy( &p_srp_session->obj );
> -
> - do
> - {
> - retry_count++;
> -
> - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> - ("Attempting to reconnect %s. Connection
Attempt
> Count = %d.\n",
> - p_hba->ioc_info.profile.id_string,
> - retry_count) );
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Creating New Session For Service Entry Index
> %d.\n",
> - p_hba->ioc_info.profile.num_svc_entries));
> - p_srp_session = srp_new_session(
> - p_hba, &p_hba->p_svc_entries[i], &status );
> - if ( p_srp_session == NULL )
> - {
> - status = IB_INSUFFICIENT_MEMORY;
> - break;
> - }
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("New Session For Service Entry Index %d
> Created.\n",
> - p_hba->ioc_info.profile.num_svc_entries));
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Logging Into Session.\n"));
> - status = srp_session_login( p_srp_session );
> - if ( status == IB_SUCCESS )
> - {
> - if ( p_hba->max_sg >
> p_srp_session->connection.max_scatter_gather_entries )
> - {
> - p_hba->max_sg =
> p_srp_session->connection.max_scatter_gather_entries;
> - }
> -
> - if ( p_hba->max_srb_ext_sz >
> p_srp_session->connection.init_to_targ_iu_sz )
> - {
> - p_hba->max_srb_ext_sz =
> - sizeof( srp_send_descriptor_t
)
> -
> - SRP_MAX_IU_SIZE +
> -
> p_srp_session->connection.init_to_targ_iu_sz;
> - }
> -
> - cl_obj_lock( &p_hba->obj );
> - p_hba->session_list[i] = p_srp_session;
> - cl_obj_unlock( &p_hba->obj );
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Session Login Issued
> Successfully.\n"));
> - }
> - else
> - {
> - SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
> - ("Session Login Failure Status =
%d.\n",
> status));
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Session Object ref_cnt = %d\n",
> p_srp_session->obj.ref_cnt) );
> - cl_obj_destroy( &p_srp_session->obj );
> - }
> - } while ( (status != IB_SUCCESS) && (retry_count < 3) );
> -
> - if ( status == IB_SUCCESS )
> - {
> - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> - ("Resuming Adapter for %s.\n",
> p_hba->ioc_info.profile.id_string) );
> - p_hba->adapter_paused = FALSE;
> - StorPortReady( p_hba->p_ext );
> -// StorPortNotification( BusChangeDetected, p_hba->p_ext,
0
> );
> - }
> -
> +
> SRP_EXIT( SRP_DBG_PNP );
> + return ;
> }
>
> /* __srp_cm_reply_cb */
>
>
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
More information about the ofw
mailing list