[ofa-general] [RFC PATCH 3/4] rdma/cma: add high availability mode attribute to IDs

Tue May 13 14:15:04 PDT 2008

On Tue, May 13, 2008 at 7:13 AM, Or Gerlitz <ogerlitz at voltaire.com> wrote:
> RDMA_ALIGN_WITH_NETDEVICE high availability (ha) mode means that the consumer
>  of the rdma-cm wants that RDMA sessions would always use the same links (eg <hca/port>)
>  as the IP stack does. In the current code, this does not happen when bonding did
>  fail-over but the IB link used by an already existing session is operating fine.
>  For now this mode is supported only for the connected services of the rdma-cm.
>

I'm not sure I've even seen an "RDMA Session".

There are lots of RDMA *connections*, and there are RDMA applications
that have an application-layer session that use several RDMA connections.
But I'm fairly certain that there is no such thing as an "RDMA Session".

Which raises some serious doubts about an automatic connection tear
down based upon decisions at the RDMA layer.

This will also create problems with iWARP/IB compatability.

The iWARP standards (IETF and RDMAC) both solve the problem of
RDMA endpoint / IP Address affinity by simply mandating it. While
no real solution is given in the standards, it has generally been
interpreted to mean:

- You cannot create an RDMA connection on a device (or assign an
   existing TCP connection to an RDMA endpoint) if the device is
  not a valid route given the source/destination IP Addresses.
- You can determine the set of possible RDMA devices by first
  consulting the local routing tables using the desired source
  and destination IP addresses.
- If an RDMA device is no longer a valid route for a connection
  then the underlying TCP connection will fail (and it would be
  real nice if this happened promptly if the reason if a network
  reconfiguration rather than just waiting for things to fail).

An important corner case here is that there may not be a need
to migrate an existing RDMA Connection to a new device just
because the *preferred* route has changed. The non-preferred
route may still be fully operable and it may be preferable to
continue using it for *this* connection given the cost of tear
down and start up.

Keep in mind that if the old route does not work then it will
fail fairly quickly. If doing it quickly is important then the
device should have mechanisms to ensure that it does not
keep stale ARP or Neighbor Discovery lingering around.
If the ARP/ND information is erased the connection will
be torn down very quickly (destination unreachable).

Now for both IB and iWARP there is a substantial possibility
that a connection can be migrated to a different port within
the same or co-operating devices. In that case the High
Availability is achieved without the application having to
be involved at all.

If the connection is going to have to re-established on a
*different* device there is a substantial risk that this will
involve re-registering memory, re-connecting, and
re-advertising buffers. I don't see how you can wisely
decide that the benefits of a preferred route outweigh
these costs on an application-independent basis.
What if the application was nearly done with the
connection? Or knew that it would be ending a
current burst of activity in a few seconds and
could pay for the connection shift-back then?

And if the application is going to make the decision, then
can't it just subscribe to the local routing tables on its
own without any help from OFA?

Even if it is response to a failure on the old connection,
any application that has a "session'" concept will have
procedures for re-establishing the session on a new
connection. Where is the need for a one-size-fits-none
standardized solution?