[ofw] RE: [PATCH] SRP dreq deadlock

Leonid Keller leonid at mellanox.co.il
Thu Feb 14 09:06:48 PST 2008


I agree that offloading of the reconnect to another thread is the right
solution.
I'll wait for Alex's patch. 

> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com] 
> Sent: Thursday, February 14, 2008 6:29 PM
> To: Fab Tillier; Yossi Leybovich; Leonid Keller; 
> ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: [PATCH] SRP dreq deadlock
> 
> Fab,
> 
> I've implemented session recovery/reconnect attempts 
> offloaded to a separate thread.
> I'm planning to finish and submit the patch for review next week.
> 
> Thanks,
> 
> Alex.
> 
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw- 
> > bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Thursday, February 14, 2008 10:58 AM
> > To: Yossi Leybovich; Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] RE: [PATCH] SRP dreq deadlock
> > 
> > Will the SRP driver ever try to reconnect?  It seems that it won't
> unless
> > the IOC is first detected as removed from the fabric and then
> discovered
> > again, or an admin manually disables and re-enables the SRP device.
> > 
> > While blocking the CM callback thread is surely a problem 
> (I'll spare 
> > everyone the rant about making things asynchronous blah 
> blah blah) the 
> > solution should go a bit further than just commenting out the
> problematic
> > code and fix it so that the intent is preserved (unless the intent
> itself
> > is incorrect).
> > 
> > -Fab
> > 
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw- 
> > bounces at lists.openfabrics.org] On Behalf Of Yossi Leybovich
> > Sent: Thursday, February 14, 2008 2:58 AM
> > To: Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] [PATCH] SRP dreq deadlock
> > 
> > Leonid
> > 
> > While working Windows SRP with Linux SRPT (OFED) I discover 
> that after 
> > removing the SRPT the windows side stop receiving MADs.
> > The problem is that the SRP while handling the dreq callback of the
> target
> > try to reconnect and wait (in the context of the callback thread) to
> the
> > connect operation to end.
> > This is deadlock as the operation will not finish till SRP 
> release the 
> > callback thread.
> > 
> > This patch remove the reconnect code from the dreq callback.
> > 
> > Thanks
> > Yossi
> > Index: srp_connection.c
> > ===================================================================
> > --- srp_connection.c    (revision 2166)
> > +++ srp_connection.c    (working copy)
> > @@ -285,8 +285,8 @@
> >         ib_cm_drep_t    cm_drep;
> >         ib_api_status_t status;
> >         int             i;
> > -       int             retry_count = 0;
> > 
> > +
> >         SRP_ENTER( SRP_DBG_PNP );
> > 
> >         SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG, 
> @@ -334,75
> > +334,9 @@
> >         SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> >                 ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> >         cl_obj_destroy( &p_srp_session->obj );
> > -
> > -       do
> > -       {
> > -               retry_count++;
> > -
> > -               SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > -                       ("Attempting to reconnect %s. Connection
> Attempt
> > Count = %d.\n",
> > -                        p_hba->ioc_info.profile.id_string,
> > -                        retry_count) );
> > -
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("Creating New Session For Service 
> Entry Index
> > %d.\n",
> > -                       p_hba->ioc_info.profile.num_svc_entries));
> > -               p_srp_session = srp_new_session(
> > -                       p_hba, &p_hba->p_svc_entries[i], &status );
> > -               if ( p_srp_session == NULL )
> > -               {
> > -                       status = IB_INSUFFICIENT_MEMORY;
> > -                       break;
> > -               }
> > -
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("New Session For Service Entry Index %d
> > Created.\n",
> > -                       p_hba->ioc_info.profile.num_svc_entries));
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("Logging Into Session.\n"));
> > -               status = srp_session_login( p_srp_session );
> > -               if ( status == IB_SUCCESS )
> > -               {
> > -                       if ( p_hba->max_sg >
> > p_srp_session->connection.max_scatter_gather_entries )
> > -                       {
> > -                               p_hba->max_sg =
> > p_srp_session->connection.max_scatter_gather_entries;
> > -                       }
> > -
> > -                       if ( p_hba->max_srb_ext_sz >
> > p_srp_session->connection.init_to_targ_iu_sz )
> > -                       {
> > -                               p_hba->max_srb_ext_sz =
> > -                                       sizeof( 
> srp_send_descriptor_t
> )
> > -
> > -                                       SRP_MAX_IU_SIZE +
> > -
> > p_srp_session->connection.init_to_targ_iu_sz;
> > -                       }
> > -
> > -                       cl_obj_lock( &p_hba->obj );
> > -                       p_hba->session_list[i] = p_srp_session;
> > -                       cl_obj_unlock( &p_hba->obj );
> > -
> > -                       SRP_PRINT( TRACE_LEVEL_VERBOSE, 
> SRP_DBG_DEBUG,
> > -                               ("Session Login Issued
> > Successfully.\n"));
> > -               }
> > -               else
> > -               {
> > -                       SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
> > -                               ("Session Login Failure Status =
> %d.\n",
> > status));
> > -                       SRP_PRINT( TRACE_LEVEL_VERBOSE, 
> SRP_DBG_DEBUG,
> > -                                ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> > -                       cl_obj_destroy( &p_srp_session->obj );
> > -               }
> > -       } while ( (status != IB_SUCCESS) && (retry_count < 3) );
> > -
> > -       if ( status == IB_SUCCESS )
> > -       {
> > -               SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > -                        ("Resuming Adapter for %s.\n",
> > p_hba->ioc_info.profile.id_string) );
> > -               p_hba->adapter_paused = FALSE;
> > -               StorPortReady( p_hba->p_ext );
> > -//             StorPortNotification( BusChangeDetected, 
> p_hba->p_ext,
> 0
> > );
> > -       }
> > -
> > +
> >         SRP_EXIT( SRP_DBG_PNP );
> > +       return ;
> >  }
> > 
> >  /* __srp_cm_reply_cb */
> > 
> > 
> > 
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 



More information about the ofw mailing list