[ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US Pacific Time (12pm EST) - Minutes
Vladimir Sokolovsky
vlad at dev.mellanox.co.il
Wed Mar 7 07:13:41 PST 2018
Michal,
I am not in favor of an automatic rebuilding of initrd as it can cause
some customized configuration lose or other undesired results, like if
one of the drivers has load issues it can cause system crash upon boot.
Of course, you are welcome to update the release notes.
PS: As you know, it will not help the one ignoring reading the release
notes :)
Regards,
Vladimir
On 03/07/2018 04:56 PM, Kalderon, Michal wrote:
> Thanks for the quick response.
>
> Perhaps it should be documented in a more generic way, as this is not only related to openibd
> But to all drivers part of the initrd.
>
> And for next versions, maybe it should be part of the build script ? with dracut -f --add-drivers <driver> for every driver in compat-rdma?
>
> Thanks,
> Michal
>
>
>> -----Original Message-----
>> From: Vladimir Sokolovsky [mailto:vlad at dev.mellanox.co.il]
>> Sent: Wednesday, March 07, 2018 4:49 PM
>> To: Kalderon, Michal <Michal.Kalderon at cavium.com>; Davis, Arlin R
>> <arlin.r.davis at intel.com>; ewg at lists.openfabrics.org; Vladimir Sokolovsky
>> <vlad at mellanox.com>
>> Cc: Rahman, Ameen <Ameen.Rahman at cavium.com>; Elior, Ariel
>> <Ariel.Elior at cavium.com>; Tayar, Tomer <Tomer.Tayar at cavium.com>
>> Subject: Re: [ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>> Pacific Time (12pm EST) - Minutes
>>
>> Hi Michal,
>>
>> This issue is documented already in the OFED_release_notes.txt:
>>
>> ==========================================================
>> =====================
>> 3. Known Issues
>> ==========================================================
>> =====================
>> ...
>>
>> 22. Bug 2640 - openibd fail to start when system is coming up:
>> The inbox kernel modules being loaded from initrd
>> So, need to rebuild the initrd by:
>> # dracut -f -v
>>
>>
>> Regards,
>>
>> Vladimir
>>
>>
>> On 03/07/2018 04:44 PM, Kalderon, Michal wrote:
>>> Hi Vlad and All,
>>>
>>> During soft-forge testing of the package containing the fixes, we hit
>>> a different issue which I believe could affect all vendors
>>>
>>> The problem is when drivers are part of the initramfs/initrd of the distro as
>> they come inbox.
>>> In our case qed/qede are inbox, but qedr is not. Meaning that qed/qede
>>> are in initrd and loaded automatically with older version than the one
>> installed by OFED. Then when qedr is probed there is a mismatch, as the
>> OFED qedr version is newer and incompatible.
>>> If the qedr was inbox as well, we probably wouldn't have even noticed that
>> OFED drivers aren't loaded.
>>> (they will be loaded only after an rmmod of all qed drivers and
>>> modprobe )
>>>
>>> In our Out-of-box installation scripts, we update the initramfs with
>>> the newly compiled drivers. I'd expect to see the same at the end of
>>> an OFED build ( dracut -f / update initrd, mkinitrd etc... for example
>>> )
>>>
>>> We could just document this for the user to rebuild the ramfs after OFED
>> installation.
>>> I'm not sure how this didn't come up until now, I've looked a bit on
>>> our setups and noticed that a lot have omit-drivers in the lsinitrd or older
>> drivers that We're installed etc... this is easy to miss. In addition, the only
>> reason we did hit this is because the inbox / ofed drivers are mismatched...
>>> Thanks,
>>> Michal
>>>
>>>
>>>> -----Original Message-----
>>>> From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
>>>> Sent: Monday, March 05, 2018 4:34 AM
>>>> To: Kalderon, Michal <Michal.Kalderon at cavium.com>;
>>>> ewg at lists.openfabrics.org
>>>> Cc: Woodruff, Robert J <robert.j.woodruff at intel.com>; Vladimir
>>>> Sokolovsky <vlad at mellanox.com>; Amrani, Ram
>> <Ram.Amrani at cavium.com>;
>>>> Rahman, Ameen <Ameen.Rahman at cavium.com>; Brendan Myers
>>>> <Brendan.Myers at soft-forge.com>; Vladimir Sokolovsky
>>>> <vlad at mellanox.com>
>>>> Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>>>> Pacific Time (12pm EST) - Minutes
>>>>
>>>> Hello Michal,
>>>>
>>>> Great progress, kudos to the team for a quick resolution. I concur,
>>>> let's test before we roll RC3.
>>>>
>>>> Vlad, please pull these fixes into new daily build so Brendan can test.
>>>>
>>>> Arlin
>>>>
>>>>
>>>>> Hi Arlin,
>>>>>
>>>>> We've been working with Brendan on this and were able to reproduce
>>>>> on our setups fix, and test locally.
>>>>> There are three commits (2 fix the issue and 1 fix was exposed that
>>>>> our data collection had an issue)
>>>>> 2 out of the 3 fixes have already been upstream in official linux revisions.
>>>>> one of the fixes can't go through next as is as the code varies quite a bit.
>>>>>
>>>>> Brendan will only be able to fully verify the fix Monday / Tuesday.
>>>>>
>>>>> The commits that need to be pulled are in my github:
>>>>>
>>>>> https://github.com/mkalderon/ofed-compat-
>>>>> rdma/commit/f20134d8f4736c6ce30975bb920cf64c2ec4248d
>>>>> https://github.com/mkalderon/ofed-compat-
>>>>> rdma/commit/171235eb14bf2a7bccd28650470c44807ea644e4
>>>>> https://github.com/mkalderon/ofed-compat-
>>>>> rdma/commit/4c5949ba5d075d814e30dc18bd4cdd71b45c972f
>>>>>
>>>>> I would prefer Brendan gave this a test before rc-3. But I
>>>>> understand we're on a tight timeframe.
>>>>>
>>>>> thanks,
>>>>> Michal
>>>>>
>>>>> ________________________________________
>>>>> From: Davis, Arlin R <arlin.r.davis at intel.com>
>>>>> Sent: Friday, March 2, 2018 9:50 PM
>>>>> To: ewg at lists.openfabrics.org
>>>>> Cc: Kalderon, Michal; Woodruff, Robert J; Vladimir Sokolovsky;
>>>>> Amrani, Ram; Rahman, Ameen
>>>>> Subject: RE: OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>>>>> Pacific Time (12pm EST) - Minutes
>>>>>
>>>>> Quick update on RC3..
>>>>>
>>>>> Broadcom has all critical bugs fixed and included in a new daily build.
>>>>> Thanks!
>>>>> http://downloads.openfabrics.org/OFED/ofed-4.8-2-daily/OFED-4.8-2-
>>>>> 20180228-1121.tgz
>>>>>
>>>>>
>>>>> Our final blocking item is a critical "perftest hang" issue on a
>>>>> Cavium
>>>>> QL45412 RoCE adapter.
>>>>>
>>>>> Bug 2674<http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2674>
>>>>> "Unable to complete RDMA applications (perftest)".
>>>>>
>>>>> Michal, can we please get an ETA for the fix or a "won't fix"
>>>>> disposition so we can push forward with RC3?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Arlin
>>>>>
>>>>> From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of
>>>>> Davis, Arlin R
>>>>> Sent: Monday, February 26, 2018 1:04 PM
>>>>> To: ewg at lists.openfabrics.org
>>>>> Subject: [ewg] OFA EWG Meeting: Monday, Feb 26, 2017, 09:00 AM US
>>>>> Pacific Time (12pm EST) - Minutes
>>>>>
>>>>> Attendees:
>>>>>
>>>>> Rupert Dance SW Forge
>>>>> Pradeep Kankipati Broadcom
>>>>> Robert Woodruff Intel
>>>>> Arlin Davis Intel
>>>>> Michal Kalderon Cavium
>>>>> Vladimir Sokolovsky Mellanox
>>>>>
>>>>>
>>>>> Minutes:
>>>>>
>>>>>
>>>>>
>>>>> · Opens
>>>>>
>>>>> o Broadcom's RC1 validation testing uncovered new critical bug. Fix is in
>>>> the
>>>>> works, would like to get fix into 4.8-2
>>>>>
>>>>> § Broadcom will open new bug with details. (FIO stress test caused
>>>>> hang)
>>>>>
>>>>>
>>>>>
>>>>> · OFED 4.8-2 RC2 status:
>> http://downloads.openfabrics.org/OFED/ofed-
>>>>> 4.8-2/OFED-4.8-2-rc2.tgz
>>>>>
>>>>> o Release Notes:
>>>>> http://downloads.openfabrics.org/OFED/release_notes/OFED_4.8-2-
>> rc2-
>>>>> release_notes
>>>>>
>>>>> o Test Status:
>>>>>
>>>>> § Intel - RC2 build/validation (mlx4/5) RH 7.1, 7.2, 7.3, 7.4 SLES
>>>>> 12.1, 12.2,
>>>>> 12.3 - Passed
>>>>>
>>>>> § VMware - RC2 validation complete - Passed
>>>>>
>>>>> § IWG interop results - new sightings for Cavium (perftest) and
>>>>> Broadcom (FW update?).
>>>>>
>>>>> · Rupert will work with Cavium/Broadcom to get OFED inbox driver
>>>>> versions passing.
>>>>>
>>>>> · Note: for PF 33 RoCE interop, we prefer to use OFED inbox instead
>> of
>>>>> out-of-box drivers.
>>>>>
>>>>> o Bugs:
>>>>>
>>>>> § All - please open new bugs for any new sighting
>>>>>
>>>>>
>>>>>
>>>>> · OFED 4.8-2 GA -- Not ready
>>>>>
>>>>> o RC3 needed for new Broadcom bug and to get PF33 RoCE interop
>> tests
>>>>> passing with OFED inbox drivers.
>>>>>
>>>>>
>>>>>
>>>>> · OFED next
>>>>>
>>>>> o No discussion, OFED 4.8-2 going to RC3.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>
>>>>>
>>>>> Arlin
>>>>>
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/mailman/listinfo/ewg
More information about the ewg
mailing list