[ewg] Re: [ofa-general] iser/lustre memfree issues
Michael S. Tsirkin
mst at dev.mellanox.co.il
Wed Apr 11 03:50:29 PDT 2007
> Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [ofa-general] iser/lustre memfree issues
>
> Roland Dreier wrote:
> > > 472 Data corruption with Lustre+OFED when using FMR on memfree HCAs
> > >
> > > We see it also with iser, basically only on scsi --read-- which from
> > > IB perspective is RDMA write from the target to the initiator.
> > >
> > > The env we see it is Sinai (25204) hw_ver=A0 and fw_ver=1.2.0
> > >
> > > Ishai did not manage to reproduce it with SRP, but the fact it
> > > reproduced with two independent ULPs makes it a blocker, i think.
> >
> > We definitely need more info here. Why are you confident that the two
> > problems are the same bug?
> >
> > Have you tested with mem-free Arbel, and does the problem occur there
> > too? Or have you only tested Sinai? Does the problem go away if you
> > remove the MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry
> > in mthca_main.c?
>
> Hi Roland,
>
> We don't have memfree Arbel here however, your suggestion to remove the
> MTHCA_FLAG_SINAI_OPT flag from the mthca_hca_table[] entry in
> mthca_main.c seemed to provide a work around (and hopefully a direction
> to solve the problem...) it is running for two hours now without
> reproducing the corruption. I will leave it over night and let you know.
>
> Do you have any idea what why does the code breaks with
> MTHCA_FLAG_SINAI_OPT ?
>
> thanks again,
This actually changes several things.
Let's try changing them one at a time and see what happens.
Could you try commenting out just these 2 lines in mthca_cmd.c:
if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
MTHCA_PUT(inbox, 0x1, INIT_HCA_FLAGS1_OFFSET);
(reverting your changes, that is keeping MTHCA_FLAG_SINAI_OPT set as it was originally)
and see what happens?
For convenience the following patch should do this.
---
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 7131446..abdb355 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1315,6 +1315,7 @@ int mthca_INIT_HCA(struct mthca_dev *dev,
memset(inbox, 0, INIT_HCA_IN_SIZE);
+ if (0)
if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
MTHCA_PUT(inbox, 0x1, INIT_HCA_FLAGS1_OFFSET);
--
MST
More information about the ewg
mailing list