[openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support

Or Gerlitz ogerlitz at voltaire.com
Wed Nov 8 02:10:15 PST 2006


Michael S. Tsirkin wrote:
> Quoting Or Gerlitz <ogerlitz at voltaire.com>:
>>> Protocols that rely on RC ACK for reliability guarantees (like SDP), basically
>>> do not make it possible to address the hca failure case: you got an ACK, but
>>> remote hca could have failed without committing data to memory. So APM failover
>>> is a requirement for these. It could be iser does not need APM, fine.
>> This is news to me, does your HCA first sends an ACK and only then does 
>> the DMA transaction and if needed generates the CQE !?!?!?

> I can't tell either way, but why not?
> Consider also that DMA write is a posted transaction - HCA gets no indication
> when it was committed to memory, so it can not delay the ACK until this occurs.

OK, OK, I see now the IB spec piece below, it was me expecting somehow 
too much from IB RC... rethinking on this matter i see now its more 
problematic to support this ack-following-dma-memory-write-success

9.7.5.1.6 ACKNOWLEDGE MESSAGE SCHEDULING

For SEND or RDMA WRITE requests, an ACK may be scheduled before
data is actually written into the responder’s memory. The ACK simply 
indicates that the data has successfully reached the fault domain of the 
responding node. That is, the data has been received by the channel
adapter and the channel adapter will write that data to the memory 
system of the responding node, or the responding application will at 
least be informed of the failure.

So anyway, what's your HCA behavior wrt this?

>> and how come APM is the solution to this crazy problem?

> If HCA failure is a crazy problem, then what is the sane problem APM does *not* solve?

you misunderstood me, the "crazy problem" was related to my 
misconception of IB RC ACKs.

My question is: how does APM solves the problem with transactions whose 
ACK was received but their data was not written/committed to memory? i 
was thinking that once the HCA sense a path failover APM makes the QP to 
use the alt path and retransmits all those anACKed messages, but you are 
referring to an ACKed message...

Or.







More information about the general mailing list