[Openib-windows] RE: duplicate socket deadlock in WSD

Yossi Leybovich sleybo at mellanox.co.il
Sun Oct 30 00:38:52 PDT 2005


Hi

I did not tried the patch yet but from reading it I can see that you use
wait_cq_drain
Without locking the sock_info->mutex
Isn't it problem to check the counters without locking the sock_info first ?
If it save to check the counters without locking then the patch looks good
if not you need to add acquire and release inside the busy wait ..

Yossi 

> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at silverstorm.com] 
> Sent: Thursday, October 27, 2005 7:45 PM
> To: 'Yossi Leybovich'
> Cc: openib-windows at openib.org
> Subject: RE: duplicate socket deadlock in WSD
> 
> 
> Hi Yossi,
> 
> > From: Yossi Leybovich [mailto:sleybo at mellanox.co.il]
> > Sent: Thursday, October 27, 2005 10:10 AM
> > 
> > I think there is deadlock in the duplicate socket flow duplicate 
> > socket call wait_cq_drain with sock_info->mutex acquire 
> > (ibsp_duplicate.c line 313) and even in the busy wait loop 
> > (wait_cq_drain function ) in its does not release the mutex 
> and wait 
> > to the counters to be 0. But in the completion function 
> (copletion_wq) 
> > in case of flush in error the code try to acquire the mutex
> > so the completion function will not cont. and we are in deadlock
> 
> Can you try the following patch and let me know if it 
> resolves things?  If so, I'll commit.
> 
> Thanks,
> 
> - Fab
> 
> Index: ulp/wsd/user/ib_cm.c 
> ===================================================================
> --- ulp/wsd/user/ib_cm.c	(revision 127)
> +++ ulp/wsd/user/ib_cm.c	(working copy)
> @@ -156,12 +156,14 @@
>  		{
>  			int ret;
>  
> -			wait_cq_drain( socket_info );
> -
>  			/* Non-blocking cancel since we're in 
> CM callback
> context */
>  			ib_cm_cancel( 
> socket_info->listen.handle, NULL );
>  			socket_info->listen.handle = NULL;
> +			cl_spinlock_release( &socket_info->mutex );
>  
> +			wait_cq_drain( socket_info );
> +
> +			cl_spinlock_acquire( &socket_info->mutex );
>  			ret = ib_accept( socket_info, p_cm_req_rec );
>  			if( ret )
>  			{
> Index: ulp/wsd/user/ibsp_duplicate.c 
> ===================================================================
> --- ulp/wsd/user/ibsp_duplicate.c	(revision 127)
> +++ ulp/wsd/user/ibsp_duplicate.c	(working copy)
> @@ -310,10 +310,10 @@
>  
>  	cl_spinlock_release( &socket_info->mutex );
>  	ib_disconnect( socket_info, &reason );
> -	cl_spinlock_acquire( &socket_info->mutex );
>  
>  	wait_cq_drain( socket_info );
>  
> +	cl_spinlock_acquire( &socket_info->mutex );
>  	ib_destroy_socket( socket_info );
>  
>  	/* Put enough info in dup_info so that the remote 
> socket can recreate the connection. */
> Index: ulp/wsd/user/ibsp_iblow.c 
> ===================================================================
> --- ulp/wsd/user/ibsp_iblow.c	(revision 127)
> +++ ulp/wsd/user/ibsp_iblow.c	(working copy)
> @@ -127,6 +127,7 @@
>  			cl_spinlock_release( &socket_info->recv_lock );
>  
>  			cl_spinlock_release( &socket_info->mutex );
> +			p_io_info->p_ov = NULL;
>  			IBSP_EXIT( IBSP_DBG_IO );
>  			return;
>  		}
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20051030/d5d5555c/attachment.html>


More information about the ofw mailing list