[Openib-windows] RE: duplicate socket deadlock in WSD
Yossi Leybovich
sleybo at mellanox.co.il
Sun Oct 30 00:38:52 PDT 2005
Hi
I did not tried the patch yet but from reading it I can see that you use
wait_cq_drain
Without locking the sock_info->mutex
Isn't it problem to check the counters without locking the sock_info first ?
If it save to check the counters without locking then the patch looks good
if not you need to add acquire and release inside the busy wait ..
Yossi
> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at silverstorm.com]
> Sent: Thursday, October 27, 2005 7:45 PM
> To: 'Yossi Leybovich'
> Cc: openib-windows at openib.org
> Subject: RE: duplicate socket deadlock in WSD
>
>
> Hi Yossi,
>
> > From: Yossi Leybovich [mailto:sleybo at mellanox.co.il]
> > Sent: Thursday, October 27, 2005 10:10 AM
> >
> > I think there is deadlock in the duplicate socket flow duplicate
> > socket call wait_cq_drain with sock_info->mutex acquire
> > (ibsp_duplicate.c line 313) and even in the busy wait loop
> > (wait_cq_drain function ) in its does not release the mutex
> and wait
> > to the counters to be 0. But in the completion function
> (copletion_wq)
> > in case of flush in error the code try to acquire the mutex
> > so the completion function will not cont. and we are in deadlock
>
> Can you try the following patch and let me know if it
> resolves things? If so, I'll commit.
>
> Thanks,
>
> - Fab
>
> Index: ulp/wsd/user/ib_cm.c
> ===================================================================
> --- ulp/wsd/user/ib_cm.c (revision 127)
> +++ ulp/wsd/user/ib_cm.c (working copy)
> @@ -156,12 +156,14 @@
> {
> int ret;
>
> - wait_cq_drain( socket_info );
> -
> /* Non-blocking cancel since we're in
> CM callback
> context */
> ib_cm_cancel(
> socket_info->listen.handle, NULL );
> socket_info->listen.handle = NULL;
> + cl_spinlock_release( &socket_info->mutex );
>
> + wait_cq_drain( socket_info );
> +
> + cl_spinlock_acquire( &socket_info->mutex );
> ret = ib_accept( socket_info, p_cm_req_rec );
> if( ret )
> {
> Index: ulp/wsd/user/ibsp_duplicate.c
> ===================================================================
> --- ulp/wsd/user/ibsp_duplicate.c (revision 127)
> +++ ulp/wsd/user/ibsp_duplicate.c (working copy)
> @@ -310,10 +310,10 @@
>
> cl_spinlock_release( &socket_info->mutex );
> ib_disconnect( socket_info, &reason );
> - cl_spinlock_acquire( &socket_info->mutex );
>
> wait_cq_drain( socket_info );
>
> + cl_spinlock_acquire( &socket_info->mutex );
> ib_destroy_socket( socket_info );
>
> /* Put enough info in dup_info so that the remote
> socket can recreate the connection. */
> Index: ulp/wsd/user/ibsp_iblow.c
> ===================================================================
> --- ulp/wsd/user/ibsp_iblow.c (revision 127)
> +++ ulp/wsd/user/ibsp_iblow.c (working copy)
> @@ -127,6 +127,7 @@
> cl_spinlock_release( &socket_info->recv_lock );
>
> cl_spinlock_release( &socket_info->mutex );
> + p_io_info->p_ov = NULL;
> IBSP_EXIT( IBSP_DBG_IO );
> return;
> }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20051030/d5d5555c/attachment.html>
More information about the ofw
mailing list