[ofa-general] Question about RDMA CM

Davis, Arlin R arlin.r.davis at intel.com
Tue Sep 16 12:23:18 PDT 2008


  
>
>Random notes:
>
>1. Open MPI uses a progress thread in its OpenFabrics support for the  
>RDMA CM.  rdma_connect() is initiated from the main thread, but all  
>other events are handled from the progress thread.
>
>2. Due to some uninteresting race conditions, we only allow  
>connections to be made "in one direction" (the lower (IP address,  
>port) tuple is the initiator).  If the "wrong" MPI process desires to  
>make a connection, it makes a bogus QP and initiates an  
>rdma_connect().  The receiver process then gets the CONNECT_REQUEST  
>event, detects that the connection is coming the "wrong" way,  
>initiates the connection in the "right" direction, and then rejects  
>the "wrong" connection.  The initiator expects the rejection, and  
>simply waits for the CONNECT_REQUEST coming in the other direction.
>

Do you use rdma_cm to create QP's? If so, you have to be careful
about re-using a cm_id's QP after rejections or any other conn event
error. Not sure from your note here but if you happen to move the 
rejected initiator cm_id's qp to the new cm_id created from the 
"right" direction CR coming in as a short cut you may have problems. 

Also, do you validate the cm_id context and remote/local addresses
in your CM processing thread. Could you possibly be getting 
misguided on the established event and be sending to a QP not 
yet preposted? I guess you would see other QP errors in that case.

-arlin 



More information about the general mailing list