[ewg] Mellanox target workaround in SRP

Vu Pham vuhuong at mellanox.com
Mon Jan 10 11:58:02 PST 2011



David Dillow wrote:
> On Mon, 2011-01-10 at 10:21 -0800, Vu Pham wrote:
>> David Dillow wrote:
>>> On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
>>>> looking at the patch, I would guess that the corruption occurred when
>>>> the target got an IO request that started at a non-page-aligned address
>>>> but that spanned more than one page.
> [snip]
>>> Here's hoping someone from Mellanox can shed some light.
>>
>> I think that the patch is specific for srp initiator using Mellanox
>> FMR. It tried to avoid indirect desc with Mellanox FMR having
>> first-byte-offset != 0.
>> Since the low level implementation of mlx4/mthca_map_phys_fmr() did
>> not create + setup MPT for FMR with first_byte_offset != 0. The
>> corruption can happen with any target.
> 
> Thanks for taking a look Vu --

Thanks for taking ownership of srp :)

 but I'm not sure that is the problem,
> either. The SRP FMR mapping code is careful to mask the SG address with
> the FMR page mask, so we should never ask the HCA to map a page with the
> first_byte_offset != 0. Instead, we tell the target to request an IO
> virtual address appropriately offset into the first page of the FMR.
> 
> Or perhaps I misunderstood you, and it's the non-zero first byte offset
> in the RDMA command on the wire that is the issue, and not the FMR setup
> in the initiator? And it only affects FMR-mapped memory, not the
> kernel's MR?
> 

It's not the kernel's MR.

I suspect that the corruption happen with *only* Mellanox FMR + MPT setup without fbo and target doing RDMA with offset vaddr.

I need to ask internal hw/fw guys and confirm if it's true.

-vu





More information about the ewg mailing list