[ofa-general] Question about RDMA CM
Davis, Arlin R
arlin.r.davis at intel.com
Tue Sep 16 12:23:18 PDT 2008
>
>Random notes:
>
>1. Open MPI uses a progress thread in its OpenFabrics support for the
>RDMA CM. rdma_connect() is initiated from the main thread, but all
>other events are handled from the progress thread.
>
>2. Due to some uninteresting race conditions, we only allow
>connections to be made "in one direction" (the lower (IP address,
>port) tuple is the initiator). If the "wrong" MPI process desires to
>make a connection, it makes a bogus QP and initiates an
>rdma_connect(). The receiver process then gets the CONNECT_REQUEST
>event, detects that the connection is coming the "wrong" way,
>initiates the connection in the "right" direction, and then rejects
>the "wrong" connection. The initiator expects the rejection, and
>simply waits for the CONNECT_REQUEST coming in the other direction.
>
Do you use rdma_cm to create QP's? If so, you have to be careful
about re-using a cm_id's QP after rejections or any other conn event
error. Not sure from your note here but if you happen to move the
rejected initiator cm_id's qp to the new cm_id created from the
"right" direction CR coming in as a short cut you may have problems.
Also, do you validate the cm_id context and remote/local addresses
in your CM processing thread. Could you possibly be getting
misguided on the established event and be sending to a QP not
yet preposted? I guess you would see other QP errors in that case.
-arlin
More information about the general
mailing list