[openib-general] RFC userspace CMA
Tom Tucker
tom at opengridcomputing.com
Wed Oct 26 10:47:32 PDT 2005
On Wed, 2005-10-26 at 08:41 -0700, Sean Hefty wrote:
> Tom Tucker wrote:
> > FYI, I've started writing the iw_cm that sits below the rdma_cm. Here's
> > the general picture I have in mind.
> >
> > +---------+
> > | RDMA CM |
> > +-+-----+-+
> > | |
> > +----+ +----+
> > | |
> > +---------+ +----+----+
> > | IB CM | | IW CM |
> > +----+----+ +----+----+
> > | |
> > ____+_____ ____+_____
> > +---------+| +---------+|
> > | IB devs || | IW devs ||
> > +---------+ +---------+
>
> This is what I was envisioning as well.
>
> > I am also migrating the current iw_cm.h file to match the interfaces in
> > the rdma_cm more closely.
>
> Note that there are still some changes occurring to the rdma_cm to support
> userspace. I'm concerned about how well these changes map to iWarp, since the
> changes expose the three-way CM handshake used by IB.
>
> > In general, the IW CM methods look very much like sockets connect,
> > listen, and accept. There is an iw_cm_id like the ib_cm_id that
> > encapsulates the 5-tuple, a callback for IW CM events and a "provider
> > handle" that represents the adapter "connection cookie". The iw_cm_id is
> > passed to connect, accept, etc...
>
> Something that didn't make sense for the kernel rdma_cm running over IB was
> adding a backlog parameter to the listen request. (The IB CM is callback
> driven, so there's not really a backlog.) I will probably add this to the
> userspace API. Does iWarp need a backlog parameter in the kernel?
It is needed by some adapters. For the AMSO1100 it's passed down to the
adapter to reserve syn cache entries for incoming connections.
>
> > depending on the model. This means that calls like listen with a local
> > port wildcard can't return until the "listen_reply" comes back from the
> > adapter.
>
> I didn't quite follow this. Right now, the rdma_cm only tries to support
> wildcard IP addresses. Are you wanting to support listening on any port as
> well? What is a listen_reply?
>
Yes it's funky. Basically, the listen in this context is the combination
of a 'bind' and a 'listen'. If you specify 0 for a port number on bind,
the stack will allocate one for you.
MPI uses this to allocate a port and then advertises this port to a
central server (node of rank 0) who tells the other servers how to
contact each other. This avoids having to allocate a well known port
for each node in the MPI cluster and allows multiple apps to run
concurrently without allocating additional well-known ports.
The listen_reply from the adapter returns the port chosen and the status
of the listen request.
It is the somewhat analagous to the insert_listen_.... in the IB CM
framework.
> - Sean
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list