[ofa-general] Fwd: [rds-devel] RDS: Fix a bug in RDMA signalling
Olaf Kirch
okir at lst.de
Tue May 27 01:14:29 PDT 2008
Oops, this should have gone to ofa-general as well.
Olaf
---------- Forwarded Message ----------
Subject: [rds-devel] RDS: Fix a bug in RDMA signalling
Date: Tuesday 27 May 2008
From: Olaf Kirch <olaf.kirch at oracle.com>
To: rds-devel at oss.oracle.com
I found an issue with RDMA completions vs. IB connection teardown
yesterday - the patch seems correct to me (and passed my testing),
but I would appreciate a second set of eyeballs, and some additional
testing (Rick, can you spin the crload wheel with this one, please?)
Vlad, is there any chance to get this into 1.3.1? I think I remember
that rc2 (which is out already) was supposed to be the final release
candidate for 1.3.1 - correct?
The patch is also available from my tree at
git://git.openfabrics.org/~okir/ofed_1_3/linux-2.6.git code-drop-20080527
Olaf
-------------------
From: Olaf Kirch <olaf.kirch at oracle.com>
Subject: RDS: Fix a bug in RDMA signalling
Code inspection revealed a problem in the way we signal RDMA
completions to user space when a connection goes down.
The send CQ handler calls rds_ib_send_unmap_rm, indicating success
no matter what the WC status code says. This means we may
signal success for RDMAs that get flushed out with WR_FLUSH_ERR.
This patch fixes the problem by passing the wc.status value to
rds_ib_send_unmap_rm for inspection.
While I was at it, I moved the code that translates WC status codes
to RDMA notifications from the send CQ handler to rds_ib_send_unmap_rm
where it belongs.
Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
net/rds/ib_send.c | 39 ++++++++++++++++++++-------------------
net/rds/rdma.h | 2 +-
net/rds/send.c | 3 ++-
3 files changed, 23 insertions(+), 21 deletions(-)
Index: ofa_kernel-1.3.1/net/rds/ib_send.c
===================================================================
--- ofa_kernel-1.3.1.orig/net/rds/ib_send.c
+++ ofa_kernel-1.3.1/net/rds/ib_send.c
@@ -41,7 +41,7 @@
void rds_ib_send_unmap_rm(struct rds_ib_connection *ic,
struct rds_ib_send_work *send,
- int success)
+ int wc_status)
{
struct rds_message *rm = send->s_rm;
@@ -52,7 +52,9 @@ void rds_ib_send_unmap_rm(struct rds_ib_
DMA_TO_DEVICE);
/* raise rdma completion hwm */
- if (rm->m_rdma_op && success) {
+ if (rm->m_rdma_op && wc_status != IB_WC_WR_FLUSH_ERR) {
+ int notify_status;
+
/* If the user asked for a completion notification on this
* message, we can implement three different semantics:
* 1. Notify when we received the ACK on the RDS message
@@ -68,7 +70,20 @@ void rds_ib_send_unmap_rm(struct rds_ib_
* don't call rds_rdma_send_complete at all, and fall back to the notify
* handling in the ACK processing code.
*/
- rds_rdma_send_complete(rm);
+ switch (wc_status) {
+ case IB_WC_SUCCESS:
+ notify_status = RDS_RDMA_SUCCESS;
+ break;
+
+ case IB_WC_REM_ACCESS_ERR:
+ notify_status = RDS_RDMA_REMOTE_ERROR;
+ break;
+
+ default:
+ notify_status = RDS_RDMA_OTHER_ERROR;
+ break;
+ }
+ rds_rdma_send_complete(rm, notify_status);
if (rm->m_rdma_op->r_write)
rds_stats_add(s_send_rdma_bytes, rm->m_rdma_op->r_bytes);
@@ -118,7 +133,7 @@ void rds_ib_send_clear_ring(struct rds_i
if (send->s_wr.opcode == 0xdead)
continue;
if (send->s_rm)
- rds_ib_send_unmap_rm(ic, send, 0);
+ rds_ib_send_unmap_rm(ic, send, IB_WC_WR_FLUSH_ERR);
if (send->s_op)
ib_dma_unmap_sg(ic->i_cm_id->device,
send->s_op->r_sg, send->s_op->r_nents,
@@ -174,7 +189,7 @@ void rds_ib_send_cq_comp_handler(struct
switch (send->s_wr.opcode) {
case IB_WR_SEND:
if (send->s_rm)
- rds_ib_send_unmap_rm(ic, send, 1);
+ rds_ib_send_unmap_rm(ic, send, wc.status);
break;
case IB_WR_RDMA_WRITE:
if (send->s_op)
@@ -204,20 +219,6 @@ void rds_ib_send_cq_comp_handler(struct
oldest = (oldest + 1) % ic->i_send_ring.w_nr;
}
- if (unlikely(wc.status != IB_WC_SUCCESS && send->s_op && send->s_op->r_notifier)) {
- switch (wc.status) {
- default:
- send->s_op->r_notifier->n_status = RDS_RDMA_OTHER_ERROR;
- break;
- case IB_WC_REM_ACCESS_ERR:
- send->s_op->r_notifier->n_status = RDS_RDMA_REMOTE_ERROR;
- break;
- case IB_WC_WR_FLUSH_ERR:
- /* flushed out; not an error */
- break;
- }
- }
-
rds_ib_ring_free(&ic->i_send_ring, completed);
if (test_and_clear_bit(RDS_LL_SEND_FULL, &conn->c_flags)
Index: ofa_kernel-1.3.1/net/rds/rdma.h
===================================================================
--- ofa_kernel-1.3.1.orig/net/rds/rdma.h
+++ ofa_kernel-1.3.1/net/rds/rdma.h
@@ -71,6 +71,6 @@ int rds_cmsg_rdma_args(struct rds_sock *
int rds_cmsg_rdma_map(struct rds_sock *rs, struct rds_message *rm,
struct cmsghdr *cmsg);
void rds_rdma_free_op(struct rds_rdma_op *ro);
-void rds_rdma_send_complete(struct rds_message *rm);
+void rds_rdma_send_complete(struct rds_message *rm, int);
#endif
Index: ofa_kernel-1.3.1/net/rds/send.c
===================================================================
--- ofa_kernel-1.3.1.orig/net/rds/send.c
+++ ofa_kernel-1.3.1/net/rds/send.c
@@ -361,7 +361,7 @@ int rds_send_acked_before(struct rds_con
* the IB send completion on the RDMA op and the accompanying
* message.
*/
-void rds_rdma_send_complete(struct rds_message *rm)
+void rds_rdma_send_complete(struct rds_message *rm, int status)
{
struct rds_sock *rs = NULL;
struct rds_rdma_op *ro;
@@ -376,6 +376,7 @@ void rds_rdma_send_complete(struct rds_m
rs = rm->m_rs;
sock_hold(rds_rs_to_sk(rs));
+ notifier->n_status = status;
spin_lock(&rs->rs_lock);
list_add_tail(¬ifier->n_list, &rs->rs_notify_queue);
spin_unlock(&rs->rs_lock);
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
_______________________________________________
rds-devel mailing list
rds-devel at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/rds-devel
-------------------------------------------------------
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
More information about the general
mailing list