[ofa-general] disconnect implementation for rdma cm unconnected datagram service

Or Gerlitz or.gerlitz at gmail.com
Mon Jun 18 11:54:10 PDT 2007


On 6/18/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
>
> Or Gerlitz wrote:
> > Looking on cm_sidr_rep_handler we see that the cm id state
> > is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq
> > returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess
> > this means that rdma_disconnect on RDMA_PS_UDP would never work?
>
> Correct - there isn't a disconnect for UDP.


was that done on purpose? is there (eg implementation or spec related) any
problem to send DREQ through the CM?


> Thinking on remote qp/lid change, the equivalent I see for UDP based apps,
> > is that a remote qp/lid change would have been caught by the local stack
> > neighbouring system since it sends few unicast arps probes and the
> re-issues
> > a broadcast arp from which the new HW address (qpn / gid --> lid) would
> be learned.
> >
> > What you think would be the correct way to solve that for rdmacm based
> apps?
>
> I don't know that we can do anything about a QP change.


Just to emphesize, typical QP change here, is when a remote server process
exits and then spawned again so now the client has to reconnect else all its
packets go nowhere.


> > is there a way for the RDMA/IB stack level to provide the solution? we
> were
>
> Once the inform_info patches are in, we might be able to hook into that
> to at least provide notification that the remote address has changed.  I
> don't think there's a LID change notice, though, only GID IN/OUT.  LID
> changes would be difficult to hide from the app anyway, since the app
> must re-create their address vector.


I did not mean to totally hide from the app (eg to the extent of no need to
re create the address vector), I just wonder if the mechanics to realize
that an unconnected rdmacm id is not "connected" any more can be fully
implemented within the rdmacm.


> If we ever go as far as adding an rdma_send() call, we might be able to
> hide it better.


I don't think we want to go  there.


> > I guess that remote lid change can be emulated as disconnect if the
> rdmacm
> > would listen on IN/OUT traps, but the question if what can we do about
> the
> > remote process qp, eg in the case the process dies and then comes back
> again etc.
>
> I think the current solution is that the app must detect that they are
> no longer getting responses from the remote side and try to
> re-'connect'.  I need to give this more thought to determine if there's
> anything that we can do here.  (This seems hard without the rdma_cm
> controlling the QP and CQs.)  Do you have any ideas?


Indeed, this is somehow not easily possible in all cases for us, as we are
not always allowed to add a wire protocol on --this-- QP, but we are looking
into that. Other solution we consider is "invalidate" the app level "address
handle" (IB AH + remote QPN) every ten seconds or so and then re-connect,
but this is not very much efficient.

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070618/e0016e55/attachment.html>


More information about the general mailing list