[ofa-general] peer to peer connections support
Or Gerlitz
ogerlitz at voltaire.com
Wed Dec 19 03:33:48 PST 2007
Sean Hefty wrote:
> Peer to peer connection was never fully implemented in the ib_cm. I
> don't think it would be that hard to implement at that level, and it
> shouldn't require API changes.
With you below comment of "CM needs to know the connection model
selected by the app" I am somehow confused. With reading your other
comments, I see two options here based on whether the implementation
differentiate between peer-to-peer SIDs to client/server SIDs:
if there's no difference, then also in the peer-to-peer model, the
application must first tell the CM to listen on a SID and its up to the
CM to break the symmetry and decide who sends the REP and who ignores
the REQ.
if there is a diff, then peer-to-peer SIDs are in a different domain
then client/server SIDs.
> Support at the rdma_cm level may require an API change. There's no easy
> way for the rdma_cm to know if it should invoke the IB peer-to-peer
> connection model. I'm not even sure how one peer would know the other
> peer's port number, unless well known ports are used on both sides.
Why there should be a difference between the rdma-cm to the cm? if in
the cm you have a model without API change, wouldn't it apply also to
the rdma-cm?
>> Such support would be useful in symmetric schemes such as MPIs that
>> open connections on demand and more applications where each party can
>> both accept and initiate connections. For example, I understand that
>> some work is done now at the open mpi community to use the rdma-cm as
>> a possible channel for connection establishment.
> I would need to better understand the expected usage model, like how the
> peers find each other, but this is something that could be added if needed.
I think that in the MPI world each rank gets a SID from the local CM and
they exchange the SIDs out-of-band, then connections are opened. If its
a connection-on-demand scheme, then when ever the rank process calls
mpi_send() to peer for which the local MPI library does not have a
connection, it tries to connect. So if this happens "at once" between
some pair of ranks, there should be a way to form one connection out of
these two connecting requests. My thinking/motivation is that support of
this scheme should be in the IB stack (cm and rdma-cm) level and not in
the specific MPI implementation level.
Jeff, Jon, any comments?
Or.
More information about the general
mailing list