<html><body>
<p>I prefer the original patch, the reason is: drain_list only have one element, which the first of of the flush list, whenever its drain WR processed, it will be put in reap_list, why we need drain_list here?<br>
<br>
Thanks<br>
Shirley<br>
<br>
<img width="16" height="16" src="cid:1__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt="Inactive hide details for Roland Dreier <rdreier@cisco.com>">Roland Dreier <rdreier@cisco.com><br>
<br>
<br>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td style="background-image:url(cid:2__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com); background-repeat: no-repeat; " width="29%">
<ul>
<ul>
<ul>
<ul><b><font size="2">Roland Dreier <rdreier@cisco.com></font></b><font size="2"> </font>
<p><font size="2">06/25/08 11:45 AM</font></ul>
</ul>
</ul>
</ul>
</td><td width="71%">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">To</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2"><eli@mellanox.co.il></font></td></tr>
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">cc</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2">Shirley Ma/Beaverton/IBM@IBMUS, <general-bounces@lists.openfabrics.org>, "Olga Stern" <olgas@Voltaire.COM>, "OpenFabrics General" <general@lists.openfabrics.org></font></td></tr>
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">Subject</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2">Re: [ofa-general] Re: [RFC][PATCH] last wqe event handler patch</font></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="58"><img width="1" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""></td><td width="336"><img width="1" height="1" src="cid:3__=08BBFEE0DFE24C6B8f9e8a93df938@us.ibm.com" border="0" alt=""></td></tr>
</table>
</td></tr>
</table>
<br>
<tt>So I think I'll queue this up -- does this make sense to everyone?<br>
<br>
commit af806651d0042ca014eef62832c7e7083013b954<br>
Author: Roland Dreier <rolandd@cisco.com><br>
Date: Wed Jun 25 11:44:47 2008 -0700<br>
<br>
IPoIB/cm: Drain receive QPs one at a time<br>
<br>
The current handling of last WQE reached events is broken when<br>
multiple QPs are drained at once -- for example:<br>
<br>
1. Last WQE reached event for QP100: QP100 context is added into<br>
flush_list, and then it is put into drain_list, and does post_send<br>
of a drain WR.<br>
<br>
2. Last WQE reached event for QP200: QP200 context is added into<br>
flush_list, but not drain_list since only one drain WR will be<br>
posted.<br>
<br>
3. Last WQE reached event for QP300: QP300 context is added into<br>
flush_list, but not drain_list since only one drain WR will be<br>
posted.<br>
<br>
So at this point, QP100 is on drain_list and QP200 and QP300 are on<br>
flush_list.<br>
<br>
4. QP100 drain WR CQE is handled: put QP100 into reap_list then call<br>
ipoib_cm_start_rx_drain(), which does post_send of QP200 drain WR,<br>
and moves both QP200 and QP300 from flush_list to drain_list.<br>
<br>
5. QP200 drain WR CQE is polled, which will move both QP200 and QP300<br>
from drain_list to reap_list.<br>
<br>
6. QP200 and QP300 are both freed from reap_list.<br>
<br>
7. A CQE for QP300 is seen, but QP300 context has been freed ---> crash.<br>
<br>
The fundamental problem is that a last WQE reached event does not mean<br>
all pending completions have been queued.<br>
<br>
Fix this by moving QPs from the flush_list to the drain_list one at a<br>
time, so that we never free QPs until we really know that we are done<br>
with them.<br>
<br>
Debugged-by: Shirley Ma <xma@us.ibm.com><br>
Fix-suggested-by: Eli Cohen <eli@mellanox.co.il><br>
Signed-off-by: Roland Dreier <rolandd@cisco.com><br>
<br>
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c<br>
index 6223fc3..793f2bd 100644<br>
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c<br>
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c<br>
@@ -222,7 +222,7 @@ static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv)<br>
if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, &bad_wr))<br>
ipoib_warn(priv, "failed to post drain wr\n");<br>
<br>
- list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_drain_list);<br>
+ list_move(&p->list, &priv->cm.rx_drain_list);<br>
}<br>
<br>
static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)<br>
</tt><br>
</body></html>