***SPAM*** Re: [ofa-general] Re: [RFC][PATCH] last wqe event handler patch
Shirley Ma
xma at us.ibm.com
Wed Jun 25 13:45:31 PDT 2008
I prefer the original patch, the reason is: drain_list only have one
element, which the first of of the flush list, whenever its drain WR
processed, it will be put in reap_list, why we need drain_list here?
Thanks
Shirley
Roland
Dreier
<rdreier@ To
cisco.com <eli at mellanox.co.il>
> cc
Shirley Ma/Beaverton/IBM at IBMUS,
06/25/08 <general-bounces at lists.openfabrics.org>,
11:45 AM "Olga Stern" <olgas at Voltaire.COM>,
"OpenFabrics General"
<general at lists.openfabrics.org>
Subject
Re: [ofa-general] Re: [RFC][PATCH] last wqe
event handler patch
So I think I'll queue this up -- does this make sense to everyone?
commit af806651d0042ca014eef62832c7e7083013b954
Author: Roland Dreier <rolandd at cisco.com>
Date: Wed Jun 25 11:44:47 2008 -0700
IPoIB/cm: Drain receive QPs one at a time
The current handling of last WQE reached events is broken when
multiple QPs are drained at once -- for example:
1. Last WQE reached event for QP100: QP100 context is added into
flush_list, and then it is put into drain_list, and does post_send
of a drain WR.
2. Last WQE reached event for QP200: QP200 context is added into
flush_list, but not drain_list since only one drain WR will be
posted.
3. Last WQE reached event for QP300: QP300 context is added into
flush_list, but not drain_list since only one drain WR will be
posted.
So at this point, QP100 is on drain_list and QP200 and QP300 are on
flush_list.
4. QP100 drain WR CQE is handled: put QP100 into reap_list then call
ipoib_cm_start_rx_drain(), which does post_send of QP200 drain WR,
and moves both QP200 and QP300 from flush_list to drain_list.
5. QP200 drain WR CQE is polled, which will move both QP200 and QP300
from drain_list to reap_list.
6. QP200 and QP300 are both freed from reap_list.
7. A CQE for QP300 is seen, but QP300 context has been freed --->
crash.
The fundamental problem is that a last WQE reached event does not mean
all pending completions have been queued.
Fix this by moving QPs from the flush_list to the drain_list one at a
time, so that we never free QPs until we really know that we are done
with them.
Debugged-by: Shirley Ma <xma at us.ibm.com>
Fix-suggested-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 6223fc3..793f2bd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -222,7 +222,7 @@ static void ipoib_cm_start_rx_drain(struct
ipoib_dev_priv *priv)
if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, &bad_wr))
ipoib_warn(priv, "failed to post drain wr\n");
- list_splice_init(&priv->cm.rx_flush_list,
&priv->cm.rx_drain_list);
+ list_move(&p->list, &priv->cm.rx_drain_list);
}
static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080625/42b83d6b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080625/42b83d6b/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic21829.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080625/42b83d6b/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080625/42b83d6b/attachment-0002.gif>
More information about the general
mailing list