[ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US Pacific Time (12pm EST) - Minutes

Vladimir Sokolovsky vlad at dev.mellanox.co.il
Wed Mar 7 06:49:25 PST 2018


Hi Michal,

This issue is documented already in the OFED_release_notes.txt:

===============================================================================
3. Known Issues
===============================================================================
...

22. Bug 2640 - openibd fail to start when system is coming up:
     The inbox kernel modules being loaded from initrd
     So, need to rebuild the initrd by:
     # dracut -f -v


Regards,

Vladimir


On 03/07/2018 04:44 PM, Kalderon, Michal wrote:
> Hi Vlad and All,
>
> During soft-forge testing of the package containing the fixes, we hit a different issue which I believe could affect all vendors
>
> The problem is when drivers are part of the initramfs/initrd of the distro as they come inbox.
> In our case qed/qede are inbox, but qedr is not. Meaning that qed/qede are in initrd and loaded automatically with older version
> than the one installed by OFED. Then when qedr is probed there is a mismatch, as the OFED qedr version is newer and incompatible.
> If the qedr was inbox as well, we probably wouldn't have even noticed that OFED drivers aren't loaded.
> (they will be loaded only after an rmmod of all qed drivers and modprobe )
>
> In our Out-of-box installation scripts, we update the initramfs with the newly compiled drivers. I'd expect to see the same at the end of an OFED build
> ( dracut -f / update initrd, mkinitrd etc... for example )
>
> We could just document this for the user to rebuild the ramfs after OFED installation.
> I'm not sure how this didn't come up until now, I've looked a bit on our setups and noticed that a lot have omit-drivers in the lsinitrd or older drivers that
> We're installed etc... this is easy to miss. In addition, the only reason we did hit this is because the inbox / ofed drivers are mismatched...
>
> Thanks,
> Michal
>
>
>> -----Original Message-----
>> From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
>> Sent: Monday, March 05, 2018 4:34 AM
>> To: Kalderon, Michal <Michal.Kalderon at cavium.com>;
>> ewg at lists.openfabrics.org
>> Cc: Woodruff, Robert J <robert.j.woodruff at intel.com>; Vladimir Sokolovsky
>> <vlad at mellanox.com>; Amrani, Ram <Ram.Amrani at cavium.com>; Rahman,
>> Ameen <Ameen.Rahman at cavium.com>; Brendan Myers
>> <Brendan.Myers at soft-forge.com>; Vladimir Sokolovsky
>> <vlad at mellanox.com>
>> Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US Pacific
>> Time (12pm EST) - Minutes
>>
>> Hello Michal,
>>
>> Great progress, kudos to the team for a quick resolution. I concur, let's test
>> before we roll RC3.
>>
>> Vlad, please pull these fixes into new daily build so Brendan can test.
>>
>> Arlin
>>
>>
>>> Hi Arlin,
>>>
>>> We've been working with Brendan on this and were able to reproduce on
>>> our setups fix, and test locally.
>>> There are three commits (2 fix the issue and 1 fix was exposed that
>>> our data collection had an issue)
>>> 2 out of the 3 fixes have already been upstream in official linux revisions.
>>> one of the fixes can't go through next as is as the code varies quite a bit.
>>>
>>> Brendan will only be able to fully verify the fix Monday / Tuesday.
>>>
>>> The commits that need to be pulled are in my github:
>>>
>>> https://github.com/mkalderon/ofed-compat-
>>> rdma/commit/f20134d8f4736c6ce30975bb920cf64c2ec4248d
>>> https://github.com/mkalderon/ofed-compat-
>>> rdma/commit/171235eb14bf2a7bccd28650470c44807ea644e4
>>> https://github.com/mkalderon/ofed-compat-
>>> rdma/commit/4c5949ba5d075d814e30dc18bd4cdd71b45c972f
>>>
>>> I would prefer Brendan gave this a test before rc-3. But I understand
>>> we're on a tight timeframe.
>>>
>>> thanks,
>>> Michal
>>>
>>> ________________________________________
>>> From: Davis, Arlin R <arlin.r.davis at intel.com>
>>> Sent: Friday, March 2, 2018 9:50 PM
>>> To: ewg at lists.openfabrics.org
>>> Cc: Kalderon, Michal; Woodruff, Robert J; Vladimir Sokolovsky; Amrani,
>>> Ram; Rahman, Ameen
>>> Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>>> Pacific Time (12pm EST) - Minutes
>>>
>>> Quick update on RC3..
>>>
>>> Broadcom has all critical bugs fixed and included in a new daily build.
>>> Thanks!
>>> http://downloads.openfabrics.org/OFED/ofed-4.8-2-daily/OFED-4.8-2-
>>> 20180228-1121.tgz
>>>
>>>
>>> Our final blocking item is a critical "perftest hang" issue on a
>>> Cavium
>>> QL45412 RoCE adapter.
>>>
>>> Bug 2674<http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2674>
>>> "Unable to complete RDMA applications (perftest)".
>>>
>>> Michal, can we please get an ETA for the fix or a "won't fix"
>>> disposition so we can push forward with RC3?
>>>
>>> Regards,
>>>
>>> Arlin
>>>
>>> From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of
>>> Davis, Arlin R
>>> Sent: Monday, February 26, 2018 1:04 PM
>>> To: ewg at lists.openfabrics.org
>>> Subject: [ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>>> Pacific Time (12pm EST) - Minutes
>>>
>>> Attendees:
>>>
>>> Rupert Dance                    SW Forge
>>> Pradeep Kankipati            Broadcom
>>> Robert Woodruff              Intel
>>> Arlin Davis                         Intel
>>> Michal Kalderon               Cavium
>>> Vladimir Sokolovsky         Mellanox
>>>
>>>
>>> Minutes:
>>>
>>>
>>>
>>> ·        Opens
>>>
>>> o   Broadcom's RC1 validation testing uncovered new critical bug. Fix is in
>> the
>>> works, would like to get fix into 4.8-2
>>>
>>> §  Broadcom will open new bug with details. (FIO stress test caused
>>> hang)
>>>
>>>
>>>
>>> ·        OFED 4.8-2 RC2 status:  http://downloads.openfabrics.org/OFED/ofed-
>>> 4.8-2/OFED-4.8-2-rc2.tgz
>>>
>>> o   Release Notes:
>>> http://downloads.openfabrics.org/OFED/release_notes/OFED_4.8-2-rc2-
>>> release_notes
>>>
>>> o   Test Status:
>>>
>>> §  Intel - RC2 build/validation (mlx4/5) RH 7.1, 7.2, 7.3, 7.4 SLES
>>> 12.1, 12.2,
>>> 12.3 - Passed
>>>
>>> §  VMware - RC2 validation complete - Passed
>>>
>>> §  IWG interop results - new sightings for Cavium (perftest) and
>>> Broadcom (FW update?).
>>>
>>> ·        Rupert will work with Cavium/Broadcom to get OFED inbox driver
>>> versions passing.
>>>
>>> ·        Note: for PF 33 RoCE interop, we prefer to use OFED inbox instead of
>>> out-of-box drivers.
>>>
>>> o   Bugs:
>>>
>>> §  All - please open new bugs for any new sighting
>>>
>>>
>>>
>>> ·        OFED 4.8-2 GA --  Not ready
>>>
>>> o   RC3 needed for new Broadcom bug and to get PF33 RoCE interop tests
>>> passing with OFED inbox drivers.
>>>
>>>
>>>
>>> ·        OFED next
>>>
>>> o   No discussion, OFED 4.8-2 going to RC3.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Arlin
>>>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/ewg




More information about the ewg mailing list