[ewg] Mellanox target workaround in SRP

Vu Pham vuhuong at mellanox.com
Mon Jan 10 10:21:07 PST 2011



David Dillow wrote:
> On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote:
>> > I'm sure this was tested and shown to fix the problem; I'm just confused
>>  > as to what the problem really was and if this is still relevant. Can
>>  > someone please enlighten me?
>>
>> At this point I'm afraid it's all lost in the mists of time,
> 
> Yep, that's my fear. And since it is a corruption bug, I've got to tread
> lightly in this area. :/
>

I don't recall to discuss or review this patch with Michael Tsirkin when he summited the patch.


>> looking at the patch, I would guess that the corruption occurred when
>> the target got an IO request that started at a non-page-aligned address
>> but that spanned more than one page.
> 
> That's my thought as well, but then I'm not sure this really solved
> their problem. It may be more likely to occur in the FMR case, but the
> initiator enables clustering, so blk_rq_map_sg() could generate the same
> kinds of requests for both direct and indirect descriptors, even without
> FMR. This looks to have been true since the initiator was added to the
> kernel, though it is possible I'm misreading the code.
> 
>> I don't know if the target was ever fixed, or whether that target code
>> has any relevance today.
> 
> Here's hoping someone from Mellanox can shed some light.


I think that the patch is specific for srp initiator using Mellanox FMR. It tried to avoid indirect desc with Mellanox FMR having first-byte-offset != 0.
Since the low level implementation of mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with first_byte_offset != 0. The corruption can happen with any target.

-vu






More information about the ewg mailing list