[ofa-general] [RFC PATCH 4/4] rdma/cma: implement RDMA_ALIGN_WITH_NETDEVICE ha mode

Wed May 14 09:24:16 PDT 2008

On Wed, May 14, 2008 at 5:33 AM, Or Gerlitz <ogerlitz at voltaire.com> wrote:
> Steve Wise wrote:
>>
>> Maybe this should really be implemented in the ULP that wants this
>> behavior.  IE the ULP could register for routing/neighbour changes and tear
>> down connections and re-established them on the correct device.
>>
> Hi Steve,
>
> First, registration for neighbour changes can't serve for the purpose of
> aligning RDMA traffic with the IP stack, from bunch of reasons among them
> are:
>
> - for IB, no neighbour is created at the passive side of the unicast session
>
> - for unicast sessions, address resolution involves ARP but the neighbour
> may be deleted by the kernel since the rdma traffic does not go through the
> stack
>
> - for multicast sessions, no neighbour is created during address resolution
>
> Second, the rdma-cm does well in saving the ULP from interacting with the
> network stack, that is the ULP is not aware to the routing lookup / neigbour
> / net device used for address resolution. In that spirit I prefer to add the
> registration for net events at the low level (rdma-cm).
>
> Third, thanks for bringing the point of route changes :)
>
> Or.
>
>

Perhaps one of the most fundamental differences for RDMA services
versus the traditional socket interface is that RDMA services need
to be bound to a specific device.

When establishing a connection (or flow) the application needs to
select which device to use. Traditional socket applications do not
need to do this, but rdma-cm seems to be an acceptable solution.

The trickier problem is the one you raise on migrating a connection
or flow when IP routing is reconfigured. To a classic socket
application, each IP datagram generated is sent according to
the current routing tables. A connection or flow is not sticky.
An RDMA connection (or IB UD flow) is sticky. The question
is how sticky should it be.

If it is too sticky the application may have to wait for a time-out,
or be stuck using an inferior path after the primary path is restored.
These are obviously undesirable.

But what you have not addressed is how this compares with
the cost of forcing the application session to shift connections
even when the inferior path would have been acceptable.

Is it not true that the lower performance of an inferior path
may be preferable to the cost of tearing down and recreating
a connection (and its associated protection domains and
memory regions)?

Because of those costs I can only see two options:

1) Merely enable the application to know when there has been
   a significant change in IP routing. If the current services are
   inadequate for this purpose then extend those rather than
   do an automatic connection teardown/rebuild.

2) Reduce the cost of connection teardown/rebuild by offering
    an option to "pre-bind" two RDMA devices so that memory
    registrations will be valid on both. This probably requires
    device level co-operation on L-Key/STag allocation, but
    it would be reasonable feature to consider for the High
    Availability market.

But making automatic connection teardown a standard feature
is not the best solution.