[ewg] [GIT PULL compat-rdma] qib for OFED 4.8

Marty Schlining mschlining at ddn.com
Mon Jun 19 13:12:14 PDT 2017


I tested SL 7.2 with OFED 4.8-rc4. Scanning SRP targets yielded the following response:

ib_srp: Sending CM DREQ failed and the SRP targets were never scanned. (See Bug 2632. http://bugs.openfabrics.org/show_bug.cgi?id=2632)

I attempted to use Bart Van Assche’s ib_srp_backport driver on top of SL 7.2 and OFED 4.8-rc4 according to his instructions. The ib_srp_backport driver failed to compile (see attached text file). The ib_srp_backport driver will not compile with OFED 4.8-rc4 installed. I was able to uninstall OFED 4.8-rc4, then compile the driver. However, when OFED 4.8-rc4 was reinstalled, the ib_srp_backport driver was not compatible.

That’s all of the testing , so far.

-Marty

From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
Sent: Monday, June 19, 2017 4:02 PM
To: Marty Schlining <mschlining at ddn.com>; Woodruff, Robert J <robert.j.woodruff at intel.com>; RSD at SFI <rsdance at soft-forge.com>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il>
Cc: bart.vanassche at gmail.com; ewg at lists.openfabrics.org; Cedric Fernandes <cfernandes at ddn.com>; Mike Davis <mdavis at ddn.com>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

Sorry, I am still trying to understand what is working and what is needed to fix this for OFED 4.8.

When you said you tested the upstream kernel, was it SL 7.2 kernel plus the backports from Bart’s 4.11 drop? Do you need the 4.11 kernel base driver set for SRP to work properly?

-arlin

From: Marty Schlining [mailto:mschlining at ddn.com]
Sent: Monday, June 19, 2017 9:42 AM
To: Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

No upstream kernel. SL 7.2 (3.10.0-327.el7.x86_64), OFED-4.8-rc4. Was unable to build the backport ib_srp backport driver on this platform with OFED 4.8-rc4 installed. Multiple build errors. However, if I uninstall OFED-4.8-rc4, I am able to build the upstream ib_srp RPMs on SL 7.2

Backport ib_srp driver obtained by the following mechanism:

git clone https://github.com/bvanassche/ib_srp-backport.git

-Marty


From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
Sent: Monday, June 19, 2017 12:36 PM
To: Marty Schlining <mschlining at ddn.com<mailto:mschlining at ddn.com>>; Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

Marty,

What upstream kernel did you use to verify SRP? Was it on a 4.8 base or something newer? If newer, maybe we are missing some key SRP fixes in our OFED 4.8 kernel base.

Thanks, Arlin

From: Marty Schlining [mailto:mschlining at ddn.com]
Sent: Friday, June 16, 2017 1:00 PM
To: Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

I can’t say for sure. I am attempting to build the upstream ib_srp driver for this platform, but I am running into build issues. Bart, I will send you a separate email on that subject.

From: Davis, Arlin R [mailto:arlin.r.davis at intel.com]
Sent: Friday, June 16, 2017 3:58 PM
To: Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; Marty Schlining <mschlining at ddn.com<mailto:mschlining at ddn.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

Another SRP critical issue just opened. Is this also a backport issue?

Bug 2632<http://bugs.openfabrics.org/show_bug.cgi?id=2632> - SRP Login failure for SL 7.2 and OFED 4.8-rc4 (ib_srp: Sending CM DREQ failed)


From: Woodruff, Robert J
Sent: Friday, June 16, 2017 10:24 AM
To: Marty Schlining <mschlining at ddn.com<mailto:mschlining at ddn.com>>; Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

Ok, then it sound like it is important enough to hold the release until we have a fix. I assume this is something that you can look at Vlad ?

From: Marty Schlining [mailto:mschlining at ddn.com]
Sent: Friday, June 16, 2017 10:17 AM
To: Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

I tested the upstream driver from Bart VanAssche and it does not have the same issue as the OFED-4.8-rc4 ib_srp driver. The upstream driver handles the SRP_Reject as expected without crashing the SL 7.3 kernel. That is also detailed in the defect.

From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com]
Sent: Friday, June 16, 2017 1:14 PM
To: Marty Schlining <mschlining at ddn.com<mailto:mschlining at ddn.com>>; Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

So I guess the next question is, does anyone have a fix that can be included ? Also, do you know if this is a problem with the backport or is it also broken in the upstream kernel ?

From: Marty Schlining [mailto:mschlining at ddn.com]
Sent: Friday, June 16, 2017 10:03 AM
To: Woodruff, Robert J <robert.j.woodruff at intel.com<mailto:robert.j.woodruff at intel.com>>; Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Cedric Fernandes <cfernandes at ddn.com<mailto:cfernandes at ddn.com>>; Mike Davis <mdavis at ddn.com<mailto:mdavis at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

I do not think this is an acceptable workaround. An SRP target could have many hosts connected (IO nodes), not just the one that may have been abruptly rebooted (due to a power failure or pulled power cord). Are you suggesting that customers must reboot all of the IO nodes in their cluster attached to the SRP target and the SRP target as a workaround thereby taking the entire file system offline?

The backported ib_srp driver has a defect where it is not handling an SRP_Reject properly leaving a null pointer for the blk_mq driver. Normally, this would just be a note in the log and the srp_daemon could be setup to attempt a reconnect after 30 seconds. But, that is no longer possible with the current defect. The host would crash and reboot every time a connection is attempted.

-Marty Schlining

From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com]
Sent: Friday, June 16, 2017 12:18 PM
To: Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; Marty Schlining <mschlining at ddn.com<mailto:mschlining at ddn.com>>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

When I read the bug, it looks like the work around is to also reboot the target whenever the host is rebooted. Is this work around acceptable for the short term, and then fix the
issue in OFED-4.8-1, which will be a quick turnaround release since the new functionality being added to OFED-4.8-1 is limited.  If people are OK with that, I would recommend moving
to GA for OFED-4.8 and then start work on OFED-4.8-1 right away. OFED-4.8 has been dragging on forever and there are people that will want to start using it.

From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Davis, Arlin R
Sent: Friday, June 16, 2017 9:13 AM
To: RSD at SFI <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: bart.vanassche at gmail.com<mailto:bart.vanassche at gmail.com>; ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>; mschlining at ddn.com<mailto:mschlining at ddn.com>
Subject: Re: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

What is the visibility and impact of the bug? Is this something that would be seen in normal use cases?
Is this actively being worked? What is the ETA for a fix?

From: RSD at SFI [mailto:rsdance at soft-forge.com]
Sent: Thursday, June 15, 2017 8:02 PM
To: Davis, Arlin R <arlin.r.davis at intel.com<mailto:arlin.r.davis at intel.com>>; 'Vladimir Sokolovsky' <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>
Subject: RE: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

SRP is very important to DDN and they have been a major supporter of IB and the OFA for a long time. Marty always attends the OFA Interop events and has been a big supporter of the OFA Logo program. Therefore I vote that we wait and find a fix.

From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Davis, Arlin R
Sent: Thursday, June 15, 2017 4:20 PM
To: Vladimir Sokolovsky <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>
Cc: ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>
Subject: Re: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

I just noticed a new critical SRP bug with RC4. Do we document as “known issues” or hold off on GA for a fix?

http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2631


From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Davis, Arlin R
Sent: Thursday, June 15, 2017 10:57 AM
To: Vladimir Sokolovsky <vlad at dev.mellanox.co.il<mailto:vlad at dev.mellanox.co.il>>; Schmidt, William R <william.r.schmidt at intel.com<mailto:william.r.schmidt at intel.com>>; Schulfer, Pawel <pawel.schulfer at intel.com<mailto:pawel.schulfer at intel.com>>
Cc: ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>
Subject: Re: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8


Last call for release note changes. Please get them to Vlad by end of the day today so we can wrap this up.



Vlad, please roll GA tomorrow with all updated release notes.



Thanks everyone!


From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Vladimir Sokolovsky
Sent: Thursday, June 15, 2017 8:58 AM
To: Schmidt, William R <william.r.schmidt at intel.com<mailto:william.r.schmidt at intel.com>>; Schulfer, Pawel <pawel.schulfer at intel.com<mailto:pawel.schulfer at intel.com>>
Cc: ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>
Subject: Re: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

This fix will get to GA as it was already applied as I mentioned before and the GA was not built yet.

Regards,
Vladimir
On 06/15/2017 12:51 AM, Schmidt, William R wrote:
Vlad,

Did this fix get into the GA version of OFED 4.8?

From: ewg [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Vladimir Sokolovsky
Sent: Thursday, June 8, 2017 12:17 PM
To: Schulfer, Pawel <pawel.schulfer at intel.com><mailto:pawel.schulfer at intel.com>
Cc: ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>
Subject: Re: [ewg] [GIT PULL compat-rdma] qib for OFED 4.8

On 06/07/2017 11:16 PM, Schulfer, Pawel wrote:
Hi Vlad,

Please pull from git://flatbed.openfabrics.org/~pschulfer/compat-rdma:for_vlad
commit 54931166983d20cf7d12c47f081fd5e86de1cb32  –  Support for Kernel 4.8 in QIB on OFED-4.8

Thanks,
Pawel


Hi Pawel,
Done + build: OFED-4.8-20170608-1100.tgz

Regards,
Vladimir


---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Słowackiego 173 | 80-298 Gdańsk | Sąd Rejonowy Gdańsk Północ | VII Wydział Gospodarczy Krajowego Rejestru Sądowego - KRS 101882 | NIP 957-07-52-316 | Kapitał zakładowy 200.000 PLN.

Ta wiadomość wraz z załącznikami jest przeznaczona dla określonego adresata i może zawierać informacje poufne. W razie przypadkowego otrzymania tej wiadomości, prosimy o powiadomienie nadawcy oraz trwałe jej usunięcie; jakiekolwiek przeglądanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.


_______________________________________________

ewg mailing list

ewg at lists.openfabrics.org<mailto:ewg at lists.openfabrics.org>

http://lists.openfabrics.org/mailman/listinfo/ewg


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20170619/d97e6a7b/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sl7.2-ofed-4.8-rc4-ib-srp-backport-compile_issue.txt
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20170619/d97e6a7b/attachment.txt>


More information about the ewg mailing list