[ewg] [PATCH] Request For Comments:

Steve Wise swise at opengridcomputing.com
Tue May 6 10:02:30 PDT 2008


From: Steve Wise <swise at opengridcomputing.com>

Here is the top level API change I'm proposing for enabling interoperable
peer2peer mode for iwarp.  I want to get agreement on how to expose
this to the application before posting more of the gritty details of
the kernel driver changes needed. The plan is to include this support
in linux-2.6.27 + ofed-1.4.

Does this require an ABI bump?

Note:  We could do this several ways.  I'm proposing one with this
uncompiled patch.  The downside of my proposal is the applications have
to change to turn this on.  However, I'm not sure thats too painful.
We would have OMPI turn it on, and maybe even uDAPL so that all uDAPL
ULPs would get it (IMPI, dapltest, HPMPI).

Alternative designs:

- always do peer2peer and don't let the app choose.  This forces
the overhead of p2p mode on all apps, but preserves the API.

- use and environment variable that librdmacm will query.  This doesn't
force p2p, and has the beneifit of not changing the API.  But at the
expense of adding environment variables to the rdma-cm model.  This is
used extensively in MPIs and even DAPL.  I think its an alternative
we should consider.  This approach, however, doesn't help kernel
applications.


Steve.

-----

Peer2peer support in librdmacm.

User applications can set a new u8 boolean named peer2peer_mode in
the rdma_conn_param struct to indicate if they require peer2peer mode
support.  This means they don't enforce the "client must send first" iwarp
requirement in their own application logic.  If they set peer2peer_mode
to 1, then the iwarp CM and drivers will handle this requirement.
Applications that don't require this should set peer2peer_mode to 0 to
reduce the message exchanged done at iwarp connection setup.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 include/rdma/rdma_cma.h     |    1 +
 include/rdma/rdma_cma_abi.h |    1 +
 src/cma.c                   |    2 ++
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index 76df90f..943aa45 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -118,6 +118,7 @@ struct rdma_conn_param {
 	uint8_t flow_control;
 	uint8_t retry_count;		/* ignored when accepting */
 	uint8_t rnr_retry_count;
+	uint8_t peer2peer_mode;
 	/* Fields below ignored if a QP is created on the rdma_cm_id. */
 	uint8_t srq;
 	uint32_t qp_num;
diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index 1a3a9c2..5914aaa 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -140,6 +140,7 @@ struct ucma_abi_conn_param {
 	__u8  retry_count;
 	__u8  rnr_retry_count;
 	__u8  valid;
+	__u8  peer2peer_mode;
 };
 
 struct ucma_abi_ud_param {
diff --git a/src/cma.c b/src/cma.c
index fc98c8f..dbbb2e8 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -844,6 +844,7 @@ static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst,
 	dst->retry_count = src->retry_count;
 	dst->rnr_retry_count = src->rnr_retry_count;
 	dst->valid = 1;
+	dst->peer2peer_mode = src->peer2peer_mode;
 
 	if (src->private_data && src->private_data_len) {
 		memcpy(dst->private_data, src->private_data,
@@ -1261,6 +1262,7 @@ static void ucma_copy_conn_event(struct cma_event *event,
 	dst->rnr_retry_count = src->rnr_retry_count;
 	dst->srq = src->srq;
 	dst->qp_num = src->qp_num;
+	dst->peer2peer_mode = src->peer2peer_mode;
 }
 
 static void ucma_copy_ud_event(struct cma_event *event,



More information about the ewg mailing list