[ofa-general] peer to peer connections support
Kanevsky, Arkady
Arkady.Kanevsky at netapp.com
Thu Dec 20 07:48:29 PST 2007
SO in a nutshell the proposal is to add
some identifier into "CM private data" which indicate that it
is peer-to-peer model, and unique peers IDs for the requested
connection.
Is this the model?
Thanks,
Arkady Kanevsky email: arkady at netapp.com
Network Appliance Inc. phone: 781-768-5395
1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195
Waltham, MA 02451 central phone: 781-768-5300
> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz at voltaire.com]
> Sent: Thursday, December 20, 2007 10:09 AM
> To: Sean Hefty
> Cc: OpenFabrics General
> Subject: Re: [ofa-general] peer to peer connections support
>
> Sean Hefty wrote:
> ...
> > I didn't follow this.
> ...
> > Peer to peer SIDs are in a different domain than
> client/server SIDs,
> > and the peer_to_peer field is used to indicate which domain
> a SID is in.
>
> Sorry if I wasn't clear, let me see if I understand you: with
> this different domain implementation, under both
> client/server the passive calls cm listen and the active call
> cm connect, where under peer/to/peer both sides call cm
> listen and later both sides may call cm connect or only one
> side, correct?
>
> > To add to my comments on the CM API, struct
> ib_cm_req_param, which is
> > used to send the REQ, includes service_id and peer_to_peer fields.
> > The latter is a boolean used by the CM to distinguish if
> incoming REQs
> > can be matched with the outgoing REQ.
>
> OK, this makes things clearer.
>
> >> Why there should be a difference between the rdma-cm to
> the cm? if in
> >> the cm you have a model without API change, wouldn't it
> apply also to
> >> the rdma-cm?
>
> > The rdma_cm does not know how to set the peer_to_peer field in the
> > ib_cm_req_param. It sets this field to 0 today.
>
> But it could set it to one as well... assuming my
> understanding above of the suggested implementation is
> correct, we can change the RDMA-CM API to let users specify
> on rdma_connect that they want peer to peer support, so such
> apps can issue rdma_listen call and later call rdma_connect
> with this bit set and they are done (or almost done... I
> guess there some more devil in the details here, isn't it?)
>
> > > I think that in the MPI world each rank gets a SID from
> the local
> > CM and > they exchange the SIDs out-of-band, then connections are
> > opened. If its > a connection-on-demand scheme, then when ever the
> > rank process calls > mpi_send() to peer for which the local MPI
> > library does not have a > connection, it tries to connect.
> So if this
> > happens "at once" between > some pair of ranks, there
> should be a way
> > to form one connection out of > these two connecting requests. My
> > thinking/motivation is that support of > this scheme
> should be in the
> > IB stack (cm and rdma-cm) level and not in > the specific
> MPI implementation level.
> >
> > Are the out of band connections used by MPI formed using
> client/server
> > or peer to peer? I believe that Intel MPI has each rank listen for
> > connections from the ranks below it using client/server.
>
> yes, MPIs that do all-to-all-connect on job start, typically
> use client/server where all the ranks > 0 issue listen call
> and then all lower ranks connect to higher ranks or etc some
> other symmetry breaking scheme. I am trying to see what needs
> to be supported by the IB stack to let MPIs that do connect
> on demand use the RDMA-CM.
>
> > There are a couple of problems with the peer to peer model. First,
> > unless the connections occur at exactly the same time, they miss
> > connecting (rejected with invalid SID).
>
> This makes the all peer to peer model useless, since an app
> can not make sure that connection occur at exactly the same
> time! my understanding of the spec is that peer to peer model
> has the ability to handle also connections that occur at
> exactly the same time but not only.
>
> > Second, if multiple peer to
> > peer connections need to form between the same pair of
> nodes, things
> > can go screwy (that's the technical term) trying to match
> up the peer requests.
>
> Under MPI each rank uses a different SID, so I think we are
> safe from this problem.
>
> Or
>
>
>
>
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list