[ewg] Re: [ofa-general] iser/lustre memfree issues

Michael S. Tsirkin mst at dev.mellanox.co.il
Wed Apr 11 03:50:29 PDT 2007


> Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [ofa-general] iser/lustre memfree issues
> 
> Roland Dreier wrote:
> >  > 472 Data corruption with Lustre+OFED when using FMR on memfree HCAs
> >  > 
> >  > We see it also with iser, basically only on scsi --read-- which from
> >  > IB perspective is RDMA write from the target to the initiator.
> >  > 
> >  > The env we see it is Sinai (25204) hw_ver=A0 and fw_ver=1.2.0
> >  > 
> >  > Ishai did not manage to reproduce it with SRP, but the fact it
> >  > reproduced with two independent ULPs makes it a blocker, i think.
> > 
> > We definitely need more info here.  Why are you confident that the two
> > problems are the same bug?
> > 
> > Have you tested with mem-free Arbel, and does the problem occur there
> > too?  Or have you only tested Sinai?  Does the problem go away if you
> > remove the MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry
> > in mthca_main.c?
> 
> Hi Roland,
> 
> We don't have memfree Arbel here however, your suggestion to remove the 
> MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry in 
> mthca_main.c seemed to provide a work around (and hopefully a direction 
> to solve the problem...) it is running for two hours now without 
> reproducing the corruption. I will leave it over night and let you know.
> 
> Do you have any idea what why does the code breaks with 
> MTHCA_FLAG_SINAI_OPT ?
> 
> thanks again,

This actually changes several things.
Let's try changing them one at a time and see what happens.

Could you try commenting out just these 2 lines in mthca_cmd.c:

        if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
	                MTHCA_PUT(inbox, 0x1, INIT_HCA_FLAGS1_OFFSET);

(reverting your changes, that is keeping MTHCA_FLAG_SINAI_OPT set as it was originally)
and see what happens?

For convenience the following patch should do this.

---

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 7131446..abdb355 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1315,6 +1315,7 @@ int mthca_INIT_HCA(struct mthca_dev *dev,
 
 	memset(inbox, 0, INIT_HCA_IN_SIZE);
 
+	if (0)
 	if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
 		MTHCA_PUT(inbox, 0x1, INIT_HCA_FLAGS1_OFFSET);
 


-- 
MST



More information about the ewg mailing list