[ofw] RE: [PATCH] SRP dreq deadlock

Fab Tillier ftillier at windows.microsoft.com
Thu Feb 14 09:09:08 PST 2008


Why does it block in the first place?  Is it because it blocks from the FindAdapter routine?  Could the driver change to only block there, and otherwise handle connection/disconnection asynchronously?

-Fab

-----Original Message-----
From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Thursday, February 14, 2008 9:07 AM
To: Alex Estrin; Fab Tillier; Yossi Leybovich; ofw at lists.openfabrics.org
Subject: RE: [ofw] RE: [PATCH] SRP dreq deadlock

I agree that offloading of the reconnect to another thread is the right
solution.
I'll wait for Alex's patch.

> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> Sent: Thursday, February 14, 2008 6:29 PM
> To: Fab Tillier; Yossi Leybovich; Leonid Keller;
> ofw at lists.openfabrics.org
> Subject: RE: [ofw] RE: [PATCH] SRP dreq deadlock
>
> Fab,
>
> I've implemented session recovery/reconnect attempts
> offloaded to a separate thread.
> I'm planning to finish and submit the patch for review next week.
>
> Thanks,
>
> Alex.
>
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> > bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Thursday, February 14, 2008 10:58 AM
> > To: Yossi Leybovich; Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] RE: [PATCH] SRP dreq deadlock
> >
> > Will the SRP driver ever try to reconnect?  It seems that it won't
> unless
> > the IOC is first detected as removed from the fabric and then
> discovered
> > again, or an admin manually disables and re-enables the SRP device.
> >
> > While blocking the CM callback thread is surely a problem
> (I'll spare
> > everyone the rant about making things asynchronous blah
> blah blah) the
> > solution should go a bit further than just commenting out the
> problematic
> > code and fix it so that the intent is preserved (unless the intent
> itself
> > is incorrect).
> >
> > -Fab
> >
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org [mailto:ofw-
> > bounces at lists.openfabrics.org] On Behalf Of Yossi Leybovich
> > Sent: Thursday, February 14, 2008 2:58 AM
> > To: Leonid Keller; ofw at lists.openfabrics.org
> > Subject: [ofw] [PATCH] SRP dreq deadlock
> >
> > Leonid
> >
> > While working Windows SRP with Linux SRPT (OFED) I discover
> that after
> > removing the SRPT the windows side stop receiving MADs.
> > The problem is that the SRP while handling the dreq callback of the
> target
> > try to reconnect and wait (in the context of the callback thread) to
> the
> > connect operation to end.
> > This is deadlock as the operation will not finish till SRP
> release the
> > callback thread.
> >
> > This patch remove the reconnect code from the dreq callback.
> >
> > Thanks
> > Yossi
> > Index: srp_connection.c
> > ===================================================================
> > --- srp_connection.c    (revision 2166)
> > +++ srp_connection.c    (working copy)
> > @@ -285,8 +285,8 @@
> >         ib_cm_drep_t    cm_drep;
> >         ib_api_status_t status;
> >         int             i;
> > -       int             retry_count = 0;
> >
> > +
> >         SRP_ENTER( SRP_DBG_PNP );
> >
> >         SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> @@ -334,75
> > +334,9 @@
> >         SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> >                 ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> >         cl_obj_destroy( &p_srp_session->obj );
> > -
> > -       do
> > -       {
> > -               retry_count++;
> > -
> > -               SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > -                       ("Attempting to reconnect %s. Connection
> Attempt
> > Count = %d.\n",
> > -                        p_hba->ioc_info.profile.id_string,
> > -                        retry_count) );
> > -
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("Creating New Session For Service
> Entry Index
> > %d.\n",
> > -                       p_hba->ioc_info.profile.num_svc_entries));
> > -               p_srp_session = srp_new_session(
> > -                       p_hba, &p_hba->p_svc_entries[i], &status );
> > -               if ( p_srp_session == NULL )
> > -               {
> > -                       status = IB_INSUFFICIENT_MEMORY;
> > -                       break;
> > -               }
> > -
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("New Session For Service Entry Index %d
> > Created.\n",
> > -                       p_hba->ioc_info.profile.num_svc_entries));
> > -               SRP_PRINT( TRACE_LEVEL_VERBOSE, SRP_DBG_DEBUG,
> > -                       ("Logging Into Session.\n"));
> > -               status = srp_session_login( p_srp_session );
> > -               if ( status == IB_SUCCESS )
> > -               {
> > -                       if ( p_hba->max_sg >
> > p_srp_session->connection.max_scatter_gather_entries )
> > -                       {
> > -                               p_hba->max_sg =
> > p_srp_session->connection.max_scatter_gather_entries;
> > -                       }
> > -
> > -                       if ( p_hba->max_srb_ext_sz >
> > p_srp_session->connection.init_to_targ_iu_sz )
> > -                       {
> > -                               p_hba->max_srb_ext_sz =
> > -                                       sizeof(
> srp_send_descriptor_t
> )
> > -
> > -                                       SRP_MAX_IU_SIZE +
> > -
> > p_srp_session->connection.init_to_targ_iu_sz;
> > -                       }
> > -
> > -                       cl_obj_lock( &p_hba->obj );
> > -                       p_hba->session_list[i] = p_srp_session;
> > -                       cl_obj_unlock( &p_hba->obj );
> > -
> > -                       SRP_PRINT( TRACE_LEVEL_VERBOSE,
> SRP_DBG_DEBUG,
> > -                               ("Session Login Issued
> > Successfully.\n"));
> > -               }
> > -               else
> > -               {
> > -                       SRP_PRINT( TRACE_LEVEL_ERROR, SRP_DBG_ERROR,
> > -                               ("Session Login Failure Status =
> %d.\n",
> > status));
> > -                       SRP_PRINT( TRACE_LEVEL_VERBOSE,
> SRP_DBG_DEBUG,
> > -                                ("Session Object ref_cnt = %d\n",
> > p_srp_session->obj.ref_cnt) );
> > -                       cl_obj_destroy( &p_srp_session->obj );
> > -               }
> > -       } while ( (status != IB_SUCCESS) && (retry_count < 3) );
> > -
> > -       if ( status == IB_SUCCESS )
> > -       {
> > -               SRP_PRINT( TRACE_LEVEL_INFORMATION, SRP_DBG_DEBUG,
> > -                        ("Resuming Adapter for %s.\n",
> > p_hba->ioc_info.profile.id_string) );
> > -               p_hba->adapter_paused = FALSE;
> > -               StorPortReady( p_hba->p_ext );
> > -//             StorPortNotification( BusChangeDetected,
> p_hba->p_ext,
> 0
> > );
> > -       }
> > -
> > +
> >         SRP_EXIT( SRP_DBG_PNP );
> > +       return ;
> >  }
> >
> >  /* __srp_cm_reply_cb */
> >
> >
> >
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
_______________________________________________
ofw mailing list
ofw at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list