[openib-general] RFC fix for userspace rdma cm crashes
mshefty at ichips.intel.com
Thu Jan 4 14:06:29 PST 2007
> I'm missing something here. How does a second event "for the
> same connection" occur before the connection does?
> That is, if the user has not accepted the connection request,
> how can a later packet be related to the "same connection"?
> If this is related strictly to retransmitted connection requests,
> then should userspace be involved? Should that be handled by
> the kernel connection managers and/or the device specific verbs?
Here's the actual problem that we see that hopefully explains the issue better:
Node A sends an IB CM REQ to node B
This results in new connection objects (one in the kernel, and one in userspace)
and a connect event.
After several IB CM REQ retries, node A tires of waiting for node B to respond
to the REQ and gives up
Node A sends an IB CM REJ to node B to fail the connection
The reject is part of the IB CM protocol, and the event is exposed to the user.
If the userspace application retrieves the reject event before calling
rdma_accept, it ends up dereferencing a NULL pointer because it hasn't set the
context for the new connection yet. It currently expects a valid context passed
up from the kernel in order to locate its connection object.
More information about the general