[ofa-general] Re: Re: IPoIB-CM UC mode

Michael S. Tsirkin mst at dev.mellanox.co.il
Mon Jul 2 12:53:14 PDT 2007


> > Quoting Or Gerlitz <or.gerlitz at gmail.com>:
> > Subject: Re: Re: IPoIB-CM UC mode
> > 
> > On 7/2/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > 
> > > Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> > 
> > > Is there any --other-- part of the stack (eg mthca,cm) that needs to be
> > > enhanced for that?
> > 
> > Not a whole lot.
> > We need an API to detect this feature support in HW.  There could be a bit of
> > work in mthca to detect HW/FW support for this feature, and enable connecting UC 
> > QPs to SRQ.  There could be a bit of debugging work in CM in case we hit some
> > bugs with LAP messages (which I plan to use for liveness detection).
> 
> 
> 
> Thanks for the info. Can you please elaborate a little more on the LAP based
> liveness detection mechanism you were thinking about? I might want to deploy
> it in another app.

With UC, if the remote side looses our QP, we get no indication whatsoever.  But
we don't want to destroy/recreate connections unless strictly necessary.

So we must send something that will force remote side to respond. One such
message is LAP with current primary path used as proposed alternate path.
Remote will respond with APR with AP status 5 if the connection is there, and
status 1 if it is not.

> Actually, looking on IPoIB-CM RC based implementation I don't really
> understand its "liveness detection" mechanism... In ipoib_cm_send_req() I see
> that the code sets both the RC QP retries AND rnr retries to 0... doesn't this
> mean that a single RNR NAK would cause a TX QP to move to ERROR?

Yes, this is from spec, BTW.
More importantly, a timeout will cause error, and we'll retry connection
on next packet.

> can you
> clarify how do you use the "R" of "RC" here?

The two reasons I used RC is because
1. UC does not support SRQ yet.
2. It's easier to detect connection is alive.

-- 
MST



More information about the general mailing list