[openib-general] RFC fix for userspace rdma cm crashes

Sean Hefty mshefty at ichips.intel.com
Thu Jan 4 14:06:29 PST 2007


> I'm missing something here. How does a second event "for the
> same connection" occur before the connection does?
> 
> That is, if the user has not accepted the connection request,
> how can a later packet be related to the "same connection"?
> 
> If this is related strictly to retransmitted connection requests,
> then should userspace be involved? Should that be handled by
> the kernel connection managers and/or the device specific verbs?

Here's the actual problem that we see that hopefully explains the issue better:

Node A sends an IB CM REQ to node B
This results in new connection objects (one in the kernel, and one in userspace) 
and a connect event.
After several IB CM REQ retries, node A tires of waiting for node B to respond 
to the REQ and gives up
Node A sends an IB CM REJ to node B to fail the connection

The reject is part of the IB CM protocol, and the event is exposed to the user. 
    If the userspace application retrieves the reject event before calling 
rdma_accept, it ends up dereferencing a NULL pointer because it hasn't set the 
context for the new connection yet.  It currently expects a valid context passed 
up from the kernel in order to locate its connection object.

- Sean




More information about the general mailing list