[ofa-general] Re: Re: IPoIB-CM UC mode

Or Gerlitz or.gerlitz at gmail.com
Mon Jul 2 13:13:42 PDT 2007


On 7/2/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
>
> > > Quoting Or Gerlitz <or.gerlitz at gmail.com>:



> Thanks for the info. Can you please elaborate a little more on the LAP
> based
> > liveness detection mechanism you were thinking about? I might want to
> deploy
> > it in another app.
>
> With UC, if the remote side looses our QP, we get no indication
> whatsoever.  But
> we don't want to destroy/recreate connections unless strictly necessary.


why do we care if remote side lost our QP? my thinking is that we (TX QP)
should care if
the remote side (RX QP) is still there, and this is achieved by RC as you
explain below.


So we must send something that will force remote side to respond. One such
> message is LAP with current primary path used as proposed alternate path.
> Remote will respond with APR with AP status 5 if the connection is there,
> and
> status 1 if it is not.


got it. the current app i was referring to uses UD and not UC, so I guess
LAP is not possible.

> Actually, looking on IPoIB-CM RC based implementation I don't really
> > understand its "liveness detection" mechanism... In ipoib_cm_send_req()
> I see
> > that the code sets both the RC QP retries AND rnr retries to 0...
> doesn't this
> > mean that a single RNR NAK would cause a TX QP to move to ERROR?
>
> Yes, this is from spec, BTW.
> More importantly, a timeout will cause error, and we'll retry connection
> on next packet.


so with the current IPoIB-CM implementation, single RNR NAK and/or single
ACK loss
would cause re-connection, wow... this does not sound like very ready much
for production...

My understanding is that

A) as the IP layer is seen as unreliable, RC buys us nothing
B)  the current code usage of RC
B.1) is ineffecient by nature since it loads the IB fabrics with ACKs and
NAKs
B.2) reconnects on each loss/nak - adds more ineffeciency

we should move to UC

am i missing something, what does RC buys us that UC does not?

> can you
> > clarify how do you use the "R" of "RC" here?
>
> The two reasons I used RC is because
> 1. UC does not support SRQ yet.
> 2. It's easier to detect connection is alive.
>

I wanted to understand the "how" in detail and not high-level (2 above) or
env reasons (1 above)

thanks anyway,

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070702/d2271529/attachment.html>


More information about the general mailing list