[ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US Pacific Time (12pm EST) - Minutes

Kalderon, Michal Michal.Kalderon at cavium.com
Wed Mar 7 06:44:18 PST 2018


Hi Vlad and All, 

During soft-forge testing of the package containing the fixes, we hit a different issue which I believe could affect all vendors

The problem is when drivers are part of the initramfs/initrd of the distro as they come inbox. 
In our case qed/qede are inbox, but qedr is not. Meaning that qed/qede are in initrd and loaded automatically with older version
than the one installed by OFED. Then when qedr is probed there is a mismatch, as the OFED qedr version is newer and incompatible.
If the qedr was inbox as well, we probably wouldn't have even noticed that OFED drivers aren't loaded.
(they will be loaded only after an rmmod of all qed drivers and modprobe )

In our Out-of-box installation scripts, we update the initramfs with the newly compiled drivers. I'd expect to see the same at the end of an OFED build
( dracut -f / update initrd, mkinitrd etc... for example ) 

We could just document this for the user to rebuild the ramfs after OFED installation. 
I'm not sure how this didn't come up until now, I've looked a bit on our setups and noticed that a lot have omit-drivers in the lsinitrd or older drivers that
We're installed etc... this is easy to miss. In addition, the only reason we did hit this is because the inbox / ofed drivers are mismatched... 

Thanks,
Michal


> -----Original Message-----
> From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
> Sent: Monday, March 05, 2018 4:34 AM
> To: Kalderon, Michal <Michal.Kalderon at cavium.com>;
> ewg at lists.openfabrics.org
> Cc: Woodruff, Robert J <robert.j.woodruff at intel.com>; Vladimir Sokolovsky
> <vlad at mellanox.com>; Amrani, Ram <Ram.Amrani at cavium.com>; Rahman,
> Ameen <Ameen.Rahman at cavium.com>; Brendan Myers
> <Brendan.Myers at soft-forge.com>; Vladimir Sokolovsky
> <vlad at mellanox.com>
> Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US Pacific
> Time (12pm EST) - Minutes
> 
> Hello Michal,
> 
> Great progress, kudos to the team for a quick resolution. I concur, let's test
> before we roll RC3.
> 
> Vlad, please pull these fixes into new daily build so Brendan can test.
> 
> Arlin
> 
> 
> > Hi Arlin,
> >
> > We've been working with Brendan on this and were able to reproduce on
> > our setups fix, and test locally.
> > There are three commits (2 fix the issue and 1 fix was exposed that
> > our data collection had an issue)
> > 2 out of the 3 fixes have already been upstream in official linux revisions.
> > one of the fixes can't go through next as is as the code varies quite a bit.
> >
> > Brendan will only be able to fully verify the fix Monday / Tuesday.
> >
> > The commits that need to be pulled are in my github:
> >
> > https://github.com/mkalderon/ofed-compat-
> > rdma/commit/f20134d8f4736c6ce30975bb920cf64c2ec4248d
> > https://github.com/mkalderon/ofed-compat-
> > rdma/commit/171235eb14bf2a7bccd28650470c44807ea644e4
> > https://github.com/mkalderon/ofed-compat-
> > rdma/commit/4c5949ba5d075d814e30dc18bd4cdd71b45c972f
> >
> > I would prefer Brendan gave this a test before rc-3. But I understand
> > we're on a tight timeframe.
> >
> > thanks,
> > Michal
> >
> > ________________________________________
> > From: Davis, Arlin R <arlin.r.davis at intel.com>
> > Sent: Friday, March 2, 2018 9:50 PM
> > To: ewg at lists.openfabrics.org
> > Cc: Kalderon, Michal; Woodruff, Robert J; Vladimir Sokolovsky; Amrani,
> > Ram; Rahman, Ameen
> > Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
> > Pacific Time (12pm EST) - Minutes
> >
> > Quick update on RC3..
> >
> > Broadcom has all critical bugs fixed and included in a new daily build.
> > Thanks!
> > http://downloads.openfabrics.org/OFED/ofed-4.8-2-daily/OFED-4.8-2-
> > 20180228-1121.tgz
> >
> >
> > Our final blocking item is a critical "perftest hang" issue on a
> > Cavium
> > QL45412 RoCE adapter.
> >
> > Bug 2674<http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2674>
> > "Unable to complete RDMA applications (perftest)".
> >
> > Michal, can we please get an ETA for the fix or a "won't fix"
> > disposition so we can push forward with RC3?
> >
> > Regards,
> >
> > Arlin
> >
> > From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of
> > Davis, Arlin R
> > Sent: Monday, February 26, 2018 1:04 PM
> > To: ewg at lists.openfabrics.org
> > Subject: [ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
> > Pacific Time (12pm EST) - Minutes
> >
> > Attendees:
> >
> > Rupert Dance                    SW Forge
> > Pradeep Kankipati            Broadcom
> > Robert Woodruff              Intel
> > Arlin Davis                         Intel
> > Michal Kalderon               Cavium
> > Vladimir Sokolovsky         Mellanox
> >
> >
> > Minutes:
> >
> >
> >
> > ·        Opens
> >
> > o   Broadcom's RC1 validation testing uncovered new critical bug. Fix is in
> the
> > works, would like to get fix into 4.8-2
> >
> > §  Broadcom will open new bug with details. (FIO stress test caused
> > hang)
> >
> >
> >
> > ·        OFED 4.8-2 RC2 status:  http://downloads.openfabrics.org/OFED/ofed-
> > 4.8-2/OFED-4.8-2-rc2.tgz
> >
> > o   Release Notes:
> > http://downloads.openfabrics.org/OFED/release_notes/OFED_4.8-2-rc2-
> > release_notes
> >
> > o   Test Status:
> >
> > §  Intel - RC2 build/validation (mlx4/5) RH 7.1, 7.2, 7.3, 7.4 SLES
> > 12.1, 12.2,
> > 12.3 - Passed
> >
> > §  VMware - RC2 validation complete - Passed
> >
> > §  IWG interop results - new sightings for Cavium (perftest) and
> > Broadcom (FW update?).
> >
> > ·        Rupert will work with Cavium/Broadcom to get OFED inbox driver
> > versions passing.
> >
> > ·        Note: for PF 33 RoCE interop, we prefer to use OFED inbox instead of
> > out-of-box drivers.
> >
> > o   Bugs:
> >
> > §  All - please open new bugs for any new sighting
> >
> >
> >
> > ·        OFED 4.8-2 GA --  Not ready
> >
> > o   RC3 needed for new Broadcom bug and to get PF33 RoCE interop tests
> > passing with OFED inbox drivers.
> >
> >
> >
> > ·        OFED next
> >
> > o   No discussion, OFED 4.8-2 going to RC3.
> >
> >
> >
> > Regards,
> >
> >
> >
> > Arlin
> >




More information about the ewg mailing list