[ofa-general] librdmacm feature request

Mon Oct 8 10:04:45 PDT 2007

> 1)  When you listen for connections, the event includes a new cm_id
> struct attached to the listen event channel.  Attempts to change this
> channel make the cm_id unusable (rdma_create_qp fails).  This is
> suboptimal in situations where you want the listen channel to produce
> listen events only.  A function such as rdma_modify_channel(cm_id,
> new_channel); would work to solve this.
> 
> 2)  When you create a new cm_id with the intent of connecting to another
> machine, it is again desirable to get your events related to the
> establishment of the connection in a separate channel from those events
> related to already established connections (amongst other things, if you
> are sharing a channel with a different thread that is responsible for
> tearing down connections on error, then which thread gets the
> ADDR_RESOLVED or ROUTE_RESOLVED events is up in the air...to make sure
> it gets delivered properly, I currently have the connecting thread
> pthread_mutex_lock the connection construct, set connection->cm_waiting
> = 1, then issue the rdma_resolve_route, then pthread_mutex_lock again so
> it deadlocks, and then other thread gets the event, checks
> connection->cm_waiting == 1, and if true it places the event pointer in
> connection->event, clears connection->cm_waiting, then
> pthread_mutex_unlock's the connection...how gross is that).  So, using a
> separate event channel up until the connection is established, then
> calling rdma_modify_channel() would also solve this problem.

Thanks for the feedback.  I'll give this some thought and see how 
difficult it is to add an rdma_modify_channel() routine.

> 3)  The man pages on rdma_connect() and rdma_accept() aren't really
> clear on the role of the connection parameters struct that gets passed
> in.  Specifically, it doesn't say whether or not the initiator_depth and
> responder_resources in the parm struct present in the listen event are
> what the other side set, or if they are already swapped to indicate the
> minimum/maximum that we can set on our side of the connection.  Also,
> the initial message pointer is not detailed.  When we call
> rdma_accept/rdma_reject, does our parm struct need to have that same
> pointer?  Do we need to free that mem?  Can we supply a new initial
> message and not leak the memory associated with the incoming initial
> message?

I'll update the man pages to answer your questions.

- Sean