[ofw] Re: [ofa-general] [RFC] 3/5: IB ACM: libibacm

Sean Hefty sean.hefty at intel.com
Thu Sep 17 22:57:01 PDT 2009


>Not all the world is MPI.

The focus of this package is for MPI though.  The librdmacm interface does
perform standard PR queries for applications that use that interface.  I'm not
fond the mad interfaces, but I'm not trying to fix them with this.  We can
debate whether an application should use an interface that exposes path records
and the IB CM protocol directly, but the feedback from MPI and other developers
is that connection establishment over IB requires too much code and is too
difficult.

Short term, while the ib_acm is considered experimental, I want to call the
ib_acm from under the librdmacm interface.  This allows it to be used without
applications needing to change.  Long term, if the ib_acm can to prove itself,
then accessing it directly from the kernel is a possibility.

>Your new acm stuff still does PR queries.

The primary reason for adding PR query was to verify that the path information
returned by the ib_acm was usable.  A user needs some way to know if the ib_acm
can be used on their cluster.  This was one of the last things that I added, and
I think it has value, even if only for verification purposes.  The central
mechanism the ib_acm employs to acquire path data uses multicast.

>Anyone using libibverbs multicast needs to do PR queries from
>userspace.

The ib_acm uses libibverbs multicast and does not do PR queries.

>Anyone using libibcm needs to do PR queries from userspace.

Open MPI has coded to the libibcm and does not perform PR queries.

What's needed in either of the above cases is path information; however, there
are alternate ways of obtaining this information without involving a direct
query to the SA.  MPI and DAPL can connect over IB today without doing PR
queries.  While there are limitations to determining path information without
doing a PR query, there are also limitations to obtaining path information doing
one.  Looking at current implementations, I would deduce that the latter is more
limiting than the former in practice.

>Therefore we should just jam the PR query stuff in libibcm, everyone
>can use that, and your acm can ride on the PR query code from
>libibcm for its own needs too.

These are the calls exposed through libibacm:

int ib_acm_resolve_name(char *src, char *dest,
	struct ib_acm_dev_addr *dev_addr, struct ibv_ah_attr *ah,
	struct ib_acm_resolve_data *data);
int ib_acm_resolve_ip(struct sockaddr *src, struct sockaddr *dest,
	struct ib_acm_dev_addr *dev_addr, struct ibv_ah_attr *ah,
	struct ib_acm_resolve_data *data);
int ib_acm_resolve_path(struct ib_path_record *path);
int ib_acm_query_path(struct ib_path_record *path);
int ib_acm_convert_to_path(struct ib_acm_dev_addr *dev_addr,
	struct ibv_ah_attr *ah, struct ib_acm_resolve_data *data,
	struct ib_path_record *path);

Of these, the one of most importance to the problem I'm trying to solve is
ib_acm_resolve_ip().  I do not believe that we want to add what should be
considered an experimental interface to libibcm, libibumad, or librdmacm based
on socket addresses that would then need to be maintained.

If your objection is that ib_acm_query_path() should be moved to libibcm, that's
a possibility.  libibacm already interfaces to libibumad, and it was trivial to
add support for PR queries.  libibcm does not currently depend on libibumad.
And if you take a step back in the connection process, I don't know that support
for just PR queries is sufficient for establishing a connection over IB.  You
first need to identify the endpoint, which opens up the possibility of other SA
queries.

- Sean




More information about the ofw mailing list