<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: duplicate socket deadlock in WSD</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Hi</FONT>
</P>
<P><FONT SIZE=2>I did not tried the patch yet but from reading it I can see that you use wait_cq_drain</FONT>
<BR><FONT SIZE=2>Without locking the sock_info->mutex</FONT>
<BR><FONT SIZE=2>Isn't it problem to check the counters without locking the sock_info first ?</FONT>
<BR><FONT SIZE=2>If it save to check the counters without locking then the patch looks good if not you need to add acquire and release inside the busy wait ..</FONT></P>
<P><FONT SIZE=2>Yossi </FONT>
</P>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Fab Tillier [<A HREF="mailto:ftillier@silverstorm.com">mailto:ftillier@silverstorm.com</A>] </FONT>
<BR><FONT SIZE=2>> Sent: Thursday, October 27, 2005 7:45 PM</FONT>
<BR><FONT SIZE=2>> To: 'Yossi Leybovich'</FONT>
<BR><FONT SIZE=2>> Cc: openib-windows@openib.org</FONT>
<BR><FONT SIZE=2>> Subject: RE: duplicate socket deadlock in WSD</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Hi Yossi,</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > From: Yossi Leybovich [<A HREF="mailto:sleybo@mellanox.co.il">mailto:sleybo@mellanox.co.il</A>]</FONT>
<BR><FONT SIZE=2>> > Sent: Thursday, October 27, 2005 10:10 AM</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > I think there is deadlock in the duplicate socket flow duplicate </FONT>
<BR><FONT SIZE=2>> > socket call wait_cq_drain with sock_info->mutex acquire </FONT>
<BR><FONT SIZE=2>> > (ibsp_duplicate.c line 313) and even in the busy wait loop </FONT>
<BR><FONT SIZE=2>> > (wait_cq_drain function ) in its does not release the mutex </FONT>
<BR><FONT SIZE=2>> and wait </FONT>
<BR><FONT SIZE=2>> > to the counters to be 0. But in the completion function </FONT>
<BR><FONT SIZE=2>> (copletion_wq) </FONT>
<BR><FONT SIZE=2>> > in case of flush in error the code try to acquire the mutex</FONT>
<BR><FONT SIZE=2>> > so the completion function will not cont. and we are in deadlock</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Can you try the following patch and let me know if it </FONT>
<BR><FONT SIZE=2>> resolves things? If so, I'll commit.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Thanks,</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> - Fab</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Index: ulp/wsd/user/ib_cm.c </FONT>
<BR><FONT SIZE=2>> ===================================================================</FONT>
<BR><FONT SIZE=2>> --- ulp/wsd/user/ib_cm.c (revision 127)</FONT>
<BR><FONT SIZE=2>> +++ ulp/wsd/user/ib_cm.c (working copy)</FONT>
<BR><FONT SIZE=2>> @@ -156,12 +156,14 @@</FONT>
<BR><FONT SIZE=2>> {</FONT>
<BR><FONT SIZE=2>> int ret;</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> - wait_cq_drain( socket_info );</FONT>
<BR><FONT SIZE=2>> -</FONT>
<BR><FONT SIZE=2>> /* Non-blocking cancel since we're in </FONT>
<BR><FONT SIZE=2>> CM callback</FONT>
<BR><FONT SIZE=2>> context */</FONT>
<BR><FONT SIZE=2>> ib_cm_cancel( </FONT>
<BR><FONT SIZE=2>> socket_info->listen.handle, NULL );</FONT>
<BR><FONT SIZE=2>> socket_info->listen.handle = NULL;</FONT>
<BR><FONT SIZE=2>> + cl_spinlock_release( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> + wait_cq_drain( socket_info );</FONT>
<BR><FONT SIZE=2>> +</FONT>
<BR><FONT SIZE=2>> + cl_spinlock_acquire( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> ret = ib_accept( socket_info, p_cm_req_rec );</FONT>
<BR><FONT SIZE=2>> if( ret )</FONT>
<BR><FONT SIZE=2>> {</FONT>
<BR><FONT SIZE=2>> Index: ulp/wsd/user/ibsp_duplicate.c </FONT>
<BR><FONT SIZE=2>> ===================================================================</FONT>
<BR><FONT SIZE=2>> --- ulp/wsd/user/ibsp_duplicate.c (revision 127)</FONT>
<BR><FONT SIZE=2>> +++ ulp/wsd/user/ibsp_duplicate.c (working copy)</FONT>
<BR><FONT SIZE=2>> @@ -310,10 +310,10 @@</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> cl_spinlock_release( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> ib_disconnect( socket_info, &reason );</FONT>
<BR><FONT SIZE=2>> - cl_spinlock_acquire( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> wait_cq_drain( socket_info );</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> + cl_spinlock_acquire( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> ib_destroy_socket( socket_info );</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> /* Put enough info in dup_info so that the remote </FONT>
<BR><FONT SIZE=2>> socket can recreate the connection. */</FONT>
<BR><FONT SIZE=2>> Index: ulp/wsd/user/ibsp_iblow.c </FONT>
<BR><FONT SIZE=2>> ===================================================================</FONT>
<BR><FONT SIZE=2>> --- ulp/wsd/user/ibsp_iblow.c (revision 127)</FONT>
<BR><FONT SIZE=2>> +++ ulp/wsd/user/ibsp_iblow.c (working copy)</FONT>
<BR><FONT SIZE=2>> @@ -127,6 +127,7 @@</FONT>
<BR><FONT SIZE=2>> cl_spinlock_release( &socket_info->recv_lock );</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> cl_spinlock_release( &socket_info->mutex );</FONT>
<BR><FONT SIZE=2>> + p_io_info->p_ov = NULL;</FONT>
<BR><FONT SIZE=2>> IBSP_EXIT( IBSP_DBG_IO );</FONT>
<BR><FONT SIZE=2>> return;</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>> </FONT>
</P>
</BODY>
</HTML>