[ofw] RE: [PATCH] SRP dreq deadlock
Leonid Keller
leonid at mellanox.co.il
Thu Feb 14 09:06:48 PST 2008
I agree that offloading of the reconnect to another thread is the right
solution.
I'll wait for Alex's patch.
> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> Sent: Thursday, February 14, 2008 6:29 PM
> To: Fab Tillier; Yossi Leybovich; Leonid Keller;
> ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: [PATCH] SRP dreq deadlock
>
> Fab,
>
> I've implemented session recovery/reconnect attempts
> offloaded to a separate thread.
> I'm planning to finish and submit the patch for review next week.
>
> Thanks,
>
> Alex.
>
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> > bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Thursday, February 14, 2008 10:58 AM
> > To: Yossi Leybovich; Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] RE: [PATCH] SRP dreq deadlock
> >
> > Will the SRP driver ever try to reconnect? It seems that it won't
> unless
> > the IOC is first detected as removed from the fabric and then
> discovered
> > again, or an admin manually disables and re-enables the SRP device.
> >
> > While blocking the CM callback thread is surely a problem
> (I'll spare
> > everyone the rant about making things asynchronous blah
> blah blah) the
> > solution should go a bit further than just commenting out the
> problematic
> > code and fix it so that the intent is preserved (unless the intent
> itself
> > is incorrect).
> >
> > -Fab
> >
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> > bounces at lists.openfabrics.org] On Behalf Of Yossi Leybovich
> > Sent: Thursday, February 14, 2008 2:58 AM
> > To: Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] [PATCH] SRP dreq deadlock
> >
> > Leonid
> >
> > While working Windows SRP with Linux SRPT (OFED) I discover
> that after
> > removing the SRPT the windows side stop receiving MADs.
> > The problem is that the SRP while handling the dreq callback of the
> target
> > try to reconnect and wait (in the context of the callback thread) to
> the
> > connect operation to end.
> > This is deadlock as the operation will not finish till SRP
> release the
> > callback thread.
> >
> > This patch remove the reconnect code from the dreq callback.
> >
> > Thanks
> > Yossi
> > Index: srp_connection.c
> > ===================================================================
> > --- srp_connection.c (revision 2166)
> > +++ srp_connection.c (working copy)
> > @@ -285,8 +285,8 @@
> > ib_cm_drep_t cm_drep;
> > ib_api_status_t status;
> > int i;
> > - int retry_count = 0;
> >
> > +
> > SRP_ENTER( SRP_DBG_PNP );
> >
> > SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> @@ -334,75
> > +334,9 @@
> > SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> > cl_obj_destroy( &p_srp_session->obj );
> > -
> > - do
> > - {
> > - retry_count++;
> > -
> > - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > - ("Attempting to reconnect %s. Connection
> Attempt
> > Count = %d.\n",
> > - p_hba->ioc_info.profile.id_string,
> > - retry_count) );
> > -
> > - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > - ("Creating New Session For Service
> Entry Index
> > %d.\n",
> > - p_hba->ioc_info.profile.num_svc_entries));
> > - p_srp_session = srp_new_session(
> > - p_hba, &p_hba->p_svc_entries[i], &status );
> > - if ( p_srp_session == NULL )
> > - {
> > - status = IB_INSUFFICIENT_MEMORY;
> > - break;
> > - }
> > -
> > - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > - ("New Session For Service Entry Index %d
> > Created.\n",
> > - p_hba->ioc_info.profile.num_svc_entries));
> > - SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > - ("Logging Into Session.\n"));
> > - status = srp_session_login( p_srp_session );
> > - if ( status == IB_SUCCESS )
> > - {
> > - if ( p_hba->max_sg >
> > p_srp_session->connection.max_scatter_gather_entries )
> > - {
> > - p_hba->max_sg =
> > p_srp_session->connection.max_scatter_gather_entries;
> > - }
> > -
> > - if ( p_hba->max_srb_ext_sz >
> > p_srp_session->connection.init_to_targ_iu_sz )
> > - {
> > - p_hba->max_srb_ext_sz =
> > - sizeof(
> srp_send_descriptor_t
> )
> > -
> > - SRP_MAX_IU_SIZE +
> > -
> > p_srp_session->connection.init_to_targ_iu_sz;
> > - }
> > -
> > - cl_obj_lock( &p_hba->obj );
> > - p_hba->session_list[i] = p_srp_session;
> > - cl_obj_unlock( &p_hba->obj );
> > -
> > - SRP_PRINT( TRACE_LEVEL_VERBOSE,
> SRP_DBG_DEBUG,
> > - ("Session Login Issued
> > Successfully.\n"));
> > - }
> > - else
> > - {
> > - SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
> > - ("Session Login Failure Status =
> %d.\n",
> > status));
> > - SRP_PRINT( TRACE_LEVEL_VERBOSE,
> SRP_DBG_DEBUG,
> > - ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> > - cl_obj_destroy( &p_srp_session->obj );
> > - }
> > - } while ( (status != IB_SUCCESS) && (retry_count < 3) );
> > -
> > - if ( status == IB_SUCCESS )
> > - {
> > - SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > - ("Resuming Adapter for %s.\n",
> > p_hba->ioc_info.profile.id_string) );
> > - p_hba->adapter_paused = FALSE;
> > - StorPortReady( p_hba->p_ext );
> > -// StorPortNotification( BusChangeDetected,
> p_hba->p_ext,
> 0
> > );
> > - }
> > -
> > +
> > SRP_EXIT( SRP_DBG_PNP );
> > + return ;
> > }
> >
> > /* __srp_cm_reply_cb */
> >
> >
> >
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
More information about the ofw
mailing list