[ofa-general] iSER data corruption issues

Talpey, Thomas Thomas.Talpey at netapp.com
Wed Oct 3 13:48:38 PDT 2007


At 01:42 PM 10/3/2007, Pete Wyckoff wrote:
>Single RC QP, single CQ, no SRQ.  Only Send, Receive, and RDMA Write
>work requests are used.  After everything is connected up, a SCSI
>read sequence looks like:
>
>    initiator: register pages with FMR, write test pattern
>    initiator: Send request to target
>    target:    Recv request
>    target:    RDMA Write response to initiator
>    target:    Wait for CQ entry for local RDMA Write completion
>    target:    Send response to initiator
>    initiator: Recv response, access buffer
>...
>The IB spec seems to indicate that the contents of the RDMA Write
>buffer should be stable after completion of a subsequent send
>message (o9-20).  In fact, the "Wait for CQ entry" step on the
>target should be unnecessary, no?

Not only unnecessary, on some hardware it may even be meaningless.
A local completion means only that the hardware has accepted the
RDMA Write, not that it has been sent - and certainly not that it has
been received and placed in remote memory.

I would look into the dma_sync behavior on the receiver. Especially
on an Opteron, it's critical to synchronize the iommu and cachelines
to the right memory locations. Since the FMR code hides some of
this, it may be a challenge to trace. Can you try another memory
registration strategy? NFS/RDMA can do that, for example.

Tom.



More information about the general mailing list