[ofa-general] iSER data corruption issues
Talpey, Thomas
Thomas.Talpey at netapp.com
Wed Oct 3 18:59:31 PDT 2007
At 06:04 PM 10/3/2007, Roland Dreier wrote:
> > I would look into the dma_sync behavior on the receiver. Especially
> > on an Opteron, it's critical to synchronize the iommu and cachelines
> > to the right memory locations. Since the FMR code hides some of
> > this, it may be a challenge to trace. Can you try another memory
> > registration strategy? NFS/RDMA can do that, for example.
>
>I think this is a red herring. Every IB HCA does 64-bit DMA, which
>means it bypasses all the Opteron iommu/swiotlb stuff.
>
>Also FMR doesn't hide any DMA mapping stuff; it is completely up to
>the consumer to handle all the DMA mapping, because FMRs operate
>completely at the level of bus (HCA DMA) addresses.
Fair enough, but the FMR *pools* still worry me, because they manage
internal registrations and defer their manipulation. Depending on lots
of things beyond the consumer's control, they sometimes don't even
close the handles advertised to the RDMA peer.
Bypassing the pools and going directly to the FMRs themselves avoids
this (which is what NFS/RDMA does), but iSER and SRP both use the
pool API, don't they?
So, what else sends an RDMA write into the weeds? Short of writing
to the wrong address, it sure sounds like a dma consistency thing to
me. The connection wasn't lost, so it's not an error.
Tom.
More information about the general
mailing list