[openib-general] RFC fix for userspace rdma cm crashes
Steve Wise
swise at opengridcomputing.com
Thu Jan 4 13:04:25 PST 2007
On Thu, 2007-01-04 at 12:11 -0800, Sean Hefty wrote:
> There's a problem with how rdma cm events are reported to userspace that can
> lead to application crashes.
>
> When a new connection request arrives, a context for the connection is allocated
> in the kernel. The connection event is then reported to userspace. The
> userspace library retrieves the event and allocates its own context for the
> connection. The userspace context is associated with the kernel's context when
> accepting. This allows the kernel to give userspace context with other events.
>
> A problem occurs if a second event for the same connection occurs before the
> user has had a chance to call accept. The userspace context has not yet been
> set, which causes the librdmacm to crash. (This has been seen when the app
> takes too long to call accept, resulting in the remote side timing out and
> rejecting the connection.)
>
> I can think of a couple possible fixes for this, but wanted to get input. I
> believe that this can be fixed in either the kernel or userspace code. A kernel
> fix could queue events until the context has been set. A userspace fix could
> store its contexts in a map, and lookup the correct one if it is not given.
>
Does this affect kernel rdma cm users too in some way? If not, then
perhaps the place to fix it is in userspace.
More information about the general
mailing list