[openib-general] RFC fix for userspace rdma cm crashes
Caitlin Bestler
caitlinb at broadcom.com
Thu Jan 4 14:28:34 PST 2007
openib-general-bounces at openib.org wrote:
>> Does this affect kernel rdma cm users too in some way? If not, then
>> perhaps the place to fix it is in userspace.
>
> This doesn't affect the kernel rdma_cm. The fix can either
> go into the rdma_ucm (not rdma_cm), or librdmacm, either of
> which limit the impact to userspace clients only.
>
> Here's one issue with putting the fix in the librdmacm:
> there's no guarantee that a multi-threaded client will
> process the event for a new connection before the reject.
> This makes me lean toward implementing the fix in the kernel,
> where the events are serialized.
>
With transport neutral semantics that also makes sense.
The reason that the passive side user is involved in connection
setup is to approve the connection (and specify the QP). The
extra events you cited do not in fact require any user-mode
response with the transport neutral logic. The kernel can
simply flag the connection request as being "already aborted"
and simply wait for the consumer to accept/reject. There is
no real need to interrupt the consumer to tell them about
this problem NOW (and the fact that the consumer is taking
time to answer might indicate that the Consumer is already
quite busy).
I'm not as familiar with all the corner cases in IB-specific
connection establishment, but if there is no user context
is there really a need to forward the event to userspace
as long as the reject is not forgotten?
More information about the general
mailing list