[ofa-general] peer to peer connections support

Sean Hefty mshefty at ichips.intel.com
Wed Dec 19 10:16:34 PST 2007


> With you below comment of "CM needs to know the connection model 
> selected by the app" I am somehow confused. With reading your other 
> comments, I see two options here based on whether the implementation 
> differentiate between peer-to-peer SIDs to client/server SIDs:
> 
> if there's no difference, then also in the peer-to-peer model, the 
> application must first tell the CM to listen on a SID and its up to the 
> CM to break the symmetry and decide who sends the REP and who ignores 
> the REQ.
> 
> if there is a diff, then peer-to-peer SIDs are in a different domain 
> then client/server SIDs.

I didn't follow this.

To add to my comments on the CM API, struct ib_cm_req_param, which is 
used to send the REQ, includes service_id and peer_to_peer fields.  The 
latter is a boolean used by the CM to distinguish if incoming REQs can 
be matched with the outgoing REQ.

Peer to peer SIDs are in a different domain than client/server SIDs, and 
the peer_to_peer field is used to indicate which domain a SID is in.

> Why there should be a difference between the rdma-cm to the cm? if in 
> the cm you have a model without API change, wouldn't it apply also to 
> the rdma-cm?

The rdma_cm does not know how to set the peer_to_peer field in the 
ib_cm_req_param.  It sets this field to 0 today.

> I think that in the MPI world each rank gets a SID from the local CM and 
> they exchange the SIDs out-of-band, then connections are opened. If its 
> a connection-on-demand scheme, then when ever the rank process calls 
> mpi_send() to peer for which the local MPI library does not have a 
> connection, it tries to connect. So if this happens "at once" between 
> some pair of ranks, there should be a way to form one connection out of 
> these two connecting requests. My thinking/motivation is that support of 
> this scheme should be in the IB stack (cm and rdma-cm) level and not in 
> the specific MPI implementation level.

Are the out of band connections used by MPI formed using client/server 
or peer to peer?  I believe that Intel MPI has each rank listen for 
connections from the ranks below it using client/server.

There are a couple of problems with the peer to peer model.  First, 
unless the connections occur at exactly the same time, they miss 
connecting (rejected with invalid SID).  Second, if multiple peer to 
peer connections need to form between the same pair of nodes, things can 
go screwy (that's the technical term) trying to match up the peer requests.

- Sean



More information about the general mailing list