[openib-general] [PATCH] 2.6.20 rdma_ucm: fix reporting events with invalid user context
Sean Hefty
sean.hefty at intel.com
Fri Jan 5 12:35:15 PST 2007
There's a problem with how rdma cm events are reported to userspace that can
lead to application crashes.
When a new connection request arrives, a context for the connection is allocated
in the kernel. The connection event is then reported to userspace. The
userspace library retrieves the event and allocates its own context for the
connection. The userspace context is associated with the kernel's context when
accepting. This allows the kernel to give userspace context with other events.
A problem occurs if a second event for the same connection occurs before the
user has had a chance to call accept. The userspace context has not yet been
set, which causes the librdmacm to crash. (This has been seen when the app
takes too long to call accept, resulting in the remote side timing out and
rejecting the connection.)
Fix this by ignoring events for new connections until userspace has set their
context. This can only happen if an error occurs on a new connection
before the user accepts it. This is okay, since the accept will just
fail later.
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
If everyone agrees with this approach, I'd like to try to push it into
2.6.20. It will definitely be needed for OFED, as we hit into this problem
when scaling up with DAPL.
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9f30f9b..e2e8d32 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -213,7 +213,17 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
goto out;
}
ctx->backlog--;
+ } else if (!ctx->uid) {
+ /*
+ * We ignore events for new connections until userspace has set
+ * their context. This can only happen if an error occurs on a
+ * new connection before the user accepts it. This is okay,
+ * since the accept will just fail later.
+ */
+ kfree(uevent);
+ goto out;
}
+
list_add_tail(&uevent->list, &ctx->file->event_list);
wake_up_interruptible(&ctx->file->poll_wait);
out:
More information about the general
mailing list