[openib-general] RFC fix for userspace rdma cm crashes

Caitlin Bestler caitlinb at broadcom.com
Thu Jan 4 13:36:42 PST 2007


openib-general-bounces at openib.org wrote:
> There's a problem with how rdma cm events are reported to
> userspace that can lead to application crashes.
> 
> When a new connection request arrives, a context for the
> connection is allocated in the kernel.  The connection event
> is then reported to userspace.  The userspace library
> retrieves the event and allocates its own context for the
> connection.  The userspace context is associated with the
> kernel's context when accepting.  This allows the kernel to
> give userspace context with other events.
> 
> A problem occurs if a second event for the same connection
> occurs before the user has had a chance to call accept.  The
> userspace context has not yet been
> set, which causes the librdmacm to crash.   (This has been
> seen when the app
> takes too long to call accept, resulting in the remote side
> timing out and rejecting the connection.)
> 

I'm missing something here. How does a second event "for the
same connection" occur before the connection does?

That is, if the user has not accepted the connection request,
how can a later packet be related to the "same connection"?

If this is related strictly to retransmitted connection requests,
then should userspace be involved? Should that be handled by
the kernel connection managers and/or the device specific verbs?





More information about the general mailing list