[ofa-general] Re: RDMA/iwarp CM question
Steve Wise
swise at opengridcomputing.com
Wed Sep 12 12:33:04 PDT 2007
Kanoj Sarcar wrote:
> Response to original mail did not come to me, but I
> see it in the archives, responding back to the
> archived response. Please reply all on your responses.
>
I did reply to all. My outgoing folder shows that it went to both of
your addresses...
> If the driver detaches the incoming (child) connection
> request from the listener at the point of sending the
> IW_CM_EVENT_CONNECT_REQUEST upcall, then for on-card
> connection clean up and child state cleanup in driver,
> OFA must guarantee that a accept/reject downcall will
> be made in the future.
Or you can time it out in your driver.
>
>
> I don't believe that gurantee currently exists. There
> is exactly one failure point in the call chain
> cm_work_handler():process_event():cm_conn_req_handler()
> that driver reject interface is invoked, but at
> multiple other failure points, this is not done.
>
>
> Also, looking at ucma.c, on destruction of a listener,
> I believe ucma_cleanup_events() will go around killing
> all pending IW_CM_EVENT_CONNECT_REQUEST requests, so
> the app will never get a chance to do the
> accept/reject.
>
>
It looks to me like ucma_clean_events() calls rdma_destroy_id() /
iw_destroy_cm_id() / destroy_cm_id() which calls the provider reject
function. Or NOT! :) There's a comment in the IW_CM_STATE_CONN_RECV
case inside destroy_cm_id():
> /*
> * App called destroy before/without calling accept after
> * receiving connection request event notification or
> * returned non zero from the event callback function.
> * In either case, must tell the provider to reject.
> */
But I don't see the call to reject the connection...
Maybe you could add it and see if it clears up your issue?
> Doesn't this sound like a problem (namely
> provider/card resource leak due to races with listener
> destruct)?
>
It does.
But MPA mandates a timeout so the connections will get aborted
eventually by the provider or peer...
But I think you've found a bug...
Steve.
More information about the general
mailing list