[ofw] [PATCH] SRP dreq deadlock
Usha Srinivasan
usha.srinivasan at qlogic.com
Mon Feb 18 07:58:25 PST 2008
Hi Gilad,
Alex will be sending an email soon describing the SRP patches. He'll
follow that up with patch submissions for the same. We need a little
time to get this ready; thanks for you patience!
Usha
-----Original Message-----
From: Gilad Shainer [mailto:Shainer at mellanox.com]
Sent: Sunday, February 17, 2008 10:35 AM
To: Usha Srinivasan
Cc: Yossi Leybovich; ofw at lists.openfabrics.org
Subject: RE: [ofw] [PATCH] SRP dreq deadlock
Usha,
Can you or Alex send what issues Alex has patches for? It could be that
others are facing the same issues and trying to solve the same problems.
Thanks,
Gilad.
> -----Original Message-----
> From: Usha Srinivasan [mailto:usha.srinivasan at qlogic.com]
> Sent: Thursday, February 14, 2008 5:07 PM
> To: Yossi Leybovich; Leonid Keller; ofw at lists.openfabrics.org
> Subject: RE: [ofw] [PATCH] SRP dreq deadlock
>
> Hi Yossi,
> Alex Estrin here at QLogic KOP has a couple of SRP patches pending to
> be merged into Win OF. One patch addresses this deadlock bug along
> with QP err handling and session recovery. A second patch will add a
> bit for flow control handling in order to withstand heavy load.
>
> Alex will be preparing his patch and will submit them to the community
> for review hopefully next week.
>
> Usha
>
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Yossi
> Leybovich
> Sent: Thursday, February 14, 2008 5:58 AM
> To: Leonid Keller; ofw at lists.openfabrics.org
> Subject: [ofw] [PATCH] SRP dreq deadlock
>
> Leonid
>
> While working Windows SRP with Linux SRPT (OFED) I discover that after
> removing the SRPT the windows side stop receiving MADs.
> The problem is that the SRP while handling the dreq callback of the
> target try to reconnect and wait (in the context of the callback
> thread) to the connect operation to end.
> This is deadlock as the operation will not finish till SRP release the
> callback thread.
>
> This patch remove the reconnect code from the dreq callback.
>
> Thanks
> Yossi
> Index: srp_connection.c
> ===================================================================
> --- srp_connection.c (revision 2166)
> +++ srp_connection.c (working copy)
> @@ -285,8 +285,8 @@
> ib_cm_drep_t cm_drep;
> ib_api_status_t status;
> int i;
> - int retry_count = 0;
>
> +
> SRP_ENTER( SRP_DBG_PNP );
>
> SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG, @@ -334,75
> +334,9 @@
> SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> ("Session Object ref_cnt = %d\n",
> p_srp_session->obj.ref_cnt) );
> cl_obj_destroy( &p_srp_session->obj );
> -
> - do
> - {
> - retry_count++;
> -
> - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> - ("Attempting to reconnect %s. Connection Attempt
> Count = %d.\n",
> - p_hba->ioc_info.profile.id_string,
> - retry_count) );
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Creating New Session For Service Entry Index
> %d.\n",
> - p_hba->ioc_info.profile.num_svc_entries));
> - p_srp_session = srp_new_session(
> - p_hba, &p_hba->p_svc_entries[i], &status );
> - if ( p_srp_session == NULL )
> - {
> - status = IB_INSUFFICIENT_MEMORY;
> - break;
> - }
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("New Session For Service Entry Index %d
> Created.\n",
> - p_hba->ioc_info.profile.num_svc_entries));
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Logging Into Session.\n"));
> - status = srp_session_login( p_srp_session );
> - if ( status == IB_SUCCESS )
> - {
> - if ( p_hba->max_sg >
> p_srp_session->connection.max_scatter_gather_entries )
> - {
> - p_hba->max_sg =
> p_srp_session->connection.max_scatter_gather_entries;
> - }
> -
> - if ( p_hba->max_srb_ext_sz >
> p_srp_session->connection.init_to_targ_iu_sz )
> - {
> - p_hba->max_srb_ext_sz =
> - sizeof( srp_send_descriptor_t )
> -
> - SRP_MAX_IU_SIZE +
> -
> p_srp_session->connection.init_to_targ_iu_sz;
> - }
> -
> - cl_obj_lock( &p_hba->obj );
> - p_hba->session_list[i] = p_srp_session;
> - cl_obj_unlock( &p_hba->obj );
> -
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Session Login Issued
> Successfully.\n"));
> - }
> - else
> - {
> - SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
> - ("Session Login Failure Status = %d.\n",
> status));
> - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> - ("Session Object ref_cnt = %d\n",
> p_srp_session->obj.ref_cnt) );
> - cl_obj_destroy( &p_srp_session->obj );
> - }
> - } while ( (status != IB_SUCCESS) && (retry_count < 3) );
> -
> - if ( status == IB_SUCCESS )
> - {
> - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> - ("Resuming Adapter for %s.\n",
> p_hba->ioc_info.profile.id_string) );
> - p_hba->adapter_paused = FALSE;
> - StorPortReady( p_hba->p_ext );
> -// StorPortNotification( BusChangeDetected, p_hba->p_ext, 0
> );
> - }
> -
> +
> SRP_EXIT( SRP_DBG_PNP );
> + return ;
> }
>
> /* __srp_cm_reply_cb */
>
>
More information about the ofw
mailing list