[ewg] Re: [ofa-general] iser/lustre memfree issues

Or Gerlitz ogerlitz at voltaire.com
Wed Apr 11 03:39:57 PDT 2007


Roland Dreier wrote:
>  > 472 Data corruption with Lustre+OFED when using FMR on memfree HCAs
>  > 
>  > We see it also with iser, basically only on scsi --read-- which from
>  > IB perspective is RDMA write from the target to the initiator.
>  > 
>  > The env we see it is Sinai (25204) hw_ver=A0 and fw_ver=1.2.0
>  > 
>  > Ishai did not manage to reproduce it with SRP, but the fact it
>  > reproduced with two independent ULPs makes it a blocker, i think.
> 
> We definitely need more info here.  Why are you confident that the two
> problems are the same bug?
> 
> Have you tested with mem-free Arbel, and does the problem occur there
> too?  Or have you only tested Sinai?  Does the problem go away if you
> remove the MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry
> in mthca_main.c?

Hi Roland,

We don't have memfree Arbel here however, your suggestion to remove the 
MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry in 
mthca_main.c seemed to provide a work around (and hopefully a direction 
to solve the problem...) it is running for two hours now without 
reproducing the corruption. I will leave it over night and let you know.

Do you have any idea what why does the code breaks with 
MTHCA_FLAG_SINAI_OPT ?

thanks again,

Or.





More information about the ewg mailing list