[openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support
Michael S. Tsirkin
mst at mellanox.co.il
Wed Nov 8 05:13:19 PST 2006
Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support
>
> Michael S. Tsirkin wrote:
> > Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> >>> Protocols that rely on RC ACK for reliability guarantees (like SDP), basically
> >>> do not make it possible to address the hca failure case: you got an ACK, but
> >>> remote hca could have failed without committing data to memory. So APM failover
> >>> is a requirement for these. It could be iser does not need APM, fine.
> >> This is news to me, does your HCA first sends an ACK and only then does
> >> the DMA transaction and if needed generates the CQE !?!?!?
>
> > I can't tell either way, but why not?
> > Consider also that DMA write is a posted transaction - HCA gets no indication
> > when it was committed to memory, so it can not delay the ACK until this occurs.
>
> OK, OK, I see now the IB spec piece below, it was me expecting somehow
> too much from IB RC... rethinking on this matter i see now its more
> problematic to support this ack-following-dma-memory-write-success
>
> 9.7.5.1.6 ACKNOWLEDGE MESSAGE SCHEDULING
>
> For SEND or RDMA WRITE requests, an ACK may be scheduled before
> data is actually written into the responder?s memory. The ACK simply
> indicates that the data has successfully reached the fault domain of the
> responding node. That is, the data has been received by the channel
> adapter and the channel adapter will write that data to the memory
> system of the responding node, or the responding application will at
> least be informed of the failure.
>
> So anyway, what's your HCA behavior wrt this?
The behavior matches the spec. I can't give you extra guarantees.
> >> and how come APM is the solution to this crazy problem?
>
> > If HCA failure is a crazy problem, then what is the sane problem APM does *not* solve?
>
> you misunderstood me, the "crazy problem" was related to my
> misconception of IB RC ACKs.
>
> My question is: how does APM solves the problem with transactions whose
> ACK was received but their data was not written/committed to memory?
APM does not solve it - I just say the problem as formulated is not solvable
without protocol changes.
So all we can solve for a generic RC protocol, is port/switch failure, and APM
solves this elegantly and transparently.
--
MST
More information about the general
mailing list