[ofa-general] [PATCH 1/3] Change RDMA completion notifications
Olaf Kirch
olaf.kirch at oracle.com
Wed May 7 03:51:52 PDT 2008
commit 9194a75cf945beee95f8fb8ab08015d05aa797d4
Author: Olaf Kirch <olaf.kirch at oracle.com>
Date: Wed May 7 10:40:13 2008 +0200
Change RDMA completion notifications
If the user asked for a completion notification on RDMA ops,
we can implement three different semantics:
1. Notify when we received the ACK on the RDS message
that was queued with the RDMA. This provides reliable
notification of RDMA status at the expense of a one-way
packet delay.
2. Notify when the IB stack gives us the completion event for
the RDMA operation.
3. Notify when the IB stack gives us the completion event for
the accompanying RDS messages.
In OFED 1.3, RDS implemented approach #1. This turns out to be too slow
for some purposes, so I'm switching to approach #3 with this patch.
I'm leaving the old code in place however, so that we can support
different modes later if we want.
Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 4bbab10..724167c 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -53,6 +53,23 @@ void rds_ib_send_unmap_rm(struct rds_ib_connection *ic,
/* raise rdma completion hwm */
if (rm->m_rdma_op && success) {
+ /* If the user asked for a completion notification on this
+ * message, we can implement three different semantics:
+ * 1. Notify when we received the ACK on the RDS message
+ * that was queued with the RDMA. This provides reliable
+ * notification of RDMA status at the expense of a one-way
+ * packet delay.
+ * 2. Notify when the IB stack gives us the completion event for
+ * the RDMA operation.
+ * 3. Notify when the IB stack gives us the completion event for
+ * the accompanying RDS messages.
+ * Here, we implement approach #3. To implement approach #2,
+ * call rds_rdma_send_complete from the cq_handler. To implement #1,
+ * don't call rds_rdma_send_complete at all, and fall back to the notify
+ * handling in the ACK processing code.
+ */
+ rds_rdma_send_complete(rm);
+
if (rm->m_rdma_op->r_write)
rds_stats_add(s_send_rdma_bytes, rm->m_rdma_op->r_bytes);
else
diff --git a/net/rds/rdma.h b/net/rds/rdma.h
index 2ff0cea..289f962 100644
--- a/net/rds/rdma.h
+++ b/net/rds/rdma.h
@@ -71,5 +71,6 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
int rds_cmsg_rdma_map(struct rds_sock *rs, struct rds_message *rm,
struct cmsghdr *cmsg);
void rds_rdma_free_op(struct rds_rdma_op *ro);
+void rds_rdma_send_complete(struct rds_message *rm);
#endif
diff --git a/net/rds/send.c b/net/rds/send.c
index 26e1e3e..2b7661d 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -356,6 +356,42 @@ int rds_send_acked_before(struct rds_connection *conn, u64 seq)
}
/*
+ * This is pretty similar to what happens below in the ACK
+ * handling code - except that we call here as soon as we get
+ * the IB send completion on the RDMA op and the accompanying
+ * message.
+ */
+void rds_rdma_send_complete(struct rds_message *rm)
+{
+ struct rds_sock *rs = NULL;
+ struct rds_rdma_op *ro;
+ struct rds_notifier *notifier;
+
+ spin_lock(&rm->m_rs_lock);
+
+ ro = rm->m_rdma_op;
+ if (test_bit(RDS_MSG_ON_SOCK, &rm->m_flags)
+ && ro && ro->r_notify
+ && (notifier = ro->r_notifier) != NULL) {
+ rs = rm->m_rs;
+ sock_hold(rds_rs_to_sk(rs));
+
+ spin_lock(&rs->rs_lock);
+ list_add_tail(¬ifier->n_list, &rs->rs_notify_queue);
+ spin_unlock(&rs->rs_lock);
+
+ ro->r_notifier = NULL;
+ }
+
+ spin_unlock(&rm->m_rs_lock);
+
+ if (rs) {
+ rds_wake_sk_sleep(rs);
+ sock_put(rds_rs_to_sk(rs));
+ }
+}
+
+/*
* This removes messages from the socket's list if they're on it. The list
* argument must be private to the caller, we must be able to modify it
* without locks. The messages must have a reference held for their
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
More information about the general
mailing list