[ofa-general] peer to peer connections support

Kanevsky, Arkady Arkady.Kanevsky at netapp.com
Thu Dec 20 07:48:29 PST 2007


SO in a nutshell the proposal is to add
some identifier into "CM private data" which indicate that it
is peer-to-peer model, and unique peers IDs for the requested
connection.

Is this the model?
Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz at voltaire.com] 
> Sent: Thursday, December 20, 2007 10:09 AM
> To: Sean Hefty
> Cc: OpenFabrics General
> Subject: Re: [ofa-general] peer to peer connections support
> 
> Sean Hefty wrote:
> ...
> > I didn't follow this.
> ...
> > Peer to peer SIDs are in a different domain than 
> client/server SIDs, 
> > and the peer_to_peer field is used to indicate which domain 
> a SID is in.
> 
> Sorry if I wasn't clear, let me see if I understand you: with 
> this different domain implementation, under both 
> client/server the passive calls cm listen and the active call 
> cm connect, where under peer/to/peer both sides call cm 
> listen and later both sides may call cm connect or only one 
> side, correct?
> 
> > To add to my comments on the CM API, struct 
> ib_cm_req_param, which is 
> > used to send the REQ, includes service_id and peer_to_peer fields.  
> > The latter is a boolean used by the CM to distinguish if 
> incoming REQs 
> > can be matched with the outgoing REQ.
> 
> OK, this makes things clearer.
> 
> >> Why there should be a difference between the rdma-cm to 
> the cm? if in 
> >> the cm you have a model without API change, wouldn't it 
> apply also to 
> >> the rdma-cm?
> 
> > The rdma_cm does not know how to set the peer_to_peer field in the 
> > ib_cm_req_param.  It sets this field to 0 today.
> 
> But it could set it to one as well... assuming my 
> understanding above of the suggested implementation is 
> correct, we can change the RDMA-CM API to let users specify 
> on rdma_connect that they want peer to peer support, so such 
> apps can issue rdma_listen call and later call rdma_connect 
> with this bit set and they are done (or almost done... I 
> guess there some more devil in the details here, isn't it?)
> 
> >  > I think that in the MPI world each rank gets a SID from 
> the local 
> > CM and  > they exchange the SIDs out-of-band, then connections are 
> > opened. If its  > a connection-on-demand scheme, then when ever the 
> > rank process calls  > mpi_send() to peer for which the local MPI 
> > library does not have a  > connection, it tries to connect. 
> So if this 
> > happens "at once" between  > some pair of ranks, there 
> should be a way 
> > to form one connection out of  > these two connecting requests. My 
> > thinking/motivation is that support of  > this scheme 
> should be in the 
> > IB stack (cm and rdma-cm) level and not in  > the specific 
> MPI implementation level.
> > 
> > Are the out of band connections used by MPI formed using 
> client/server 
> > or peer to peer?  I believe that Intel MPI has each rank listen for 
> > connections from the ranks below it using client/server.
> 
> yes, MPIs that do all-to-all-connect on job start, typically 
> use client/server where all the ranks > 0 issue listen call 
> and then all lower ranks connect to higher ranks or etc some 
> other symmetry breaking scheme. I am trying to see what needs 
> to be supported by the IB stack to let MPIs that do connect 
> on demand use the RDMA-CM.
> 
> > There are a couple of problems with the peer to peer model.  First, 
> > unless the connections occur at exactly the same time, they miss 
> > connecting (rejected with invalid SID).
> 
> This makes the all peer to peer model useless, since an app 
> can not make sure that connection occur at exactly the same 
> time! my understanding of the spec is that peer to peer model 
> has the ability to handle also connections that occur at 
> exactly the same time but not only.
> 
> > Second, if multiple peer to
> > peer connections need to form between the same pair of 
> nodes, things 
> > can go screwy (that's the technical term) trying to match 
> up the peer requests.
> 
> Under MPI each rank uses a different SID, so I think we are 
> safe from this problem.
> 
> Or
> 
> 
> 
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 



More information about the general mailing list