[Iwg-arbitration-committee] Updated arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)
Calciano, Jess
jess.calciano at intel.com
Wed Jul 22 13:23:34 PDT 2015
Hello Dave,
Both reports look good and are approved to publish to the Logo List.
-- Jess
From: David Wyman [mailto:dwyman at iol.unh.edu]
Sent: Wednesday, July 22, 2015 12:08 PM
To: Calciano, Jess
Cc: iwg-arbitration-committee at openfabrics.org; OFA Lab Mailing List; Cole, Cliff
Subject: Re: Updated arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)
Hello Jess,
Your arbitration request regarding the NFSoRDMA and RSockets issues found in the January 2015 Interop Logo Event has been approved and an updated report for Intel Infiniband HCAs is attached. I also attached the Intel Switch Report. Please review both and if you are satisfied with the content respond with explicit consent to post these reports to the OpenFabrics Interoperability Logo List. Please contact me with any questions or concerns.
Thank you,
Dave Wyman
UNH-IOL OpenFabrics Interoperability Logo Group
On Wed, Jun 3, 2015 at 3:10 PM, Calciano, Jess <jess.calciano at intel.com<mailto:jess.calciano at intel.com>> wrote:
Hello,
Since the original arbitration request was submitted, there’s been some further discussion about the RSockets failure. With the fix for librdmacm described in the original request, rstream ran successfully for most message sizes, but still hung with -S 1024.
Additional investigation traced the new problem to an incompatibility between the qib driver and the ibv_create_qp() function. A workaround (described below) is available for the current OFED version and a permanent fix to librdmacm will be included in the next OFED 3.18 release.
Details:
The ultimate issue is still related to the qib driver being non-compliant with the ibv_create_qp() definition:
The function ibv_create_qp() will update the qp_init_attr->cap
struct with the actual QP values of the QP that was created;
*** the values will be greater than or equal to the values requested. ***
Specifically, the qib driver will return an inline size that is smaller than that requested. Rsockets has code to trap for this, but the rsockets code looks like this:
inline_size = SOME_DEFAULT_LIKE_64
rs_init_bufs(...);
...
rs_create_qp(...);
inline_size = qp_cap->max_inline_size;
The issue is that rs_init_bufs(), which allocates the buffers and registers the memory, uses the default inline size. The net result is that rsockets ends up referencing memory that is outside of the registered memory region when sending credit updates. The lost credit update is causing the hang that you see.
A quick check shows that I can move the rs_init_bufs() call after the qp has been created and have the test work. You should also be able to override the inline_size by writing the value 0 into a config file. This will set the inline_size to 0 as the default. To do this, you need to write a 0 into /etc/rdma/rsocket/inline_default. (The actual path will depend on your configuration, so it could be under /usr/etc/rdma/... for example.) Updating the config file should work with the current version.
I will provide an update to the librdmacm to handle this. That update will find its way into the 3.18 release.
Thanks,
Jess Calciano
From: Calciano, Jess
Sent: Wednesday, April 08, 2015 2:39 PM
To: iwg-arbitration-committee at openfabrics.org<mailto:iwg-arbitration-committee at openfabrics.org>
Cc: OFA Lab Mailing List; Dave Wyman; Rupert Dance <rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>> (rsdance at soft-forge.com<mailto:rsdance at soft-forge.com>); Cole, Cliff; Mascarenhas, Edward; Sharma, Karun; Thete, Swapna; Hefty, Sean; Yan, Philip W; Flores, Jose F
Subject: Arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)
Hello,
Intel would like to file an arbitration request for the January 2015 OFA Interop Logo Event results for the Intel QLE7340 and QLE7342 HCAs.
The provided report (attached for reference) shows two failing tests:
1) TI NFS over RDMA
2) TI RSockets
The Intel team has investigated these results and determined that the failures are due to bugs in non-Intel components.
NFSoRDMA:
The failure is due to a known Connectathon issue, documented here:
http://www.spinics.net/lists/linux-nfs/msg16460.html
RSockets:
The issue is that ibv_modify_qp() is failing. The problem is that an incorrect bit is set in the qp_attr_mask, which is returned from the kernel. With Intel, bit 21 of the qp_attr_mask is set. This is not the case for a Mellanox HCA.
Bit 21 is not defined for userspace. However, it was defined in the kernel as IB_QP_SMAC.
If the librdmacm is modified to mask out this bit, the call succeeds and rstream runs successfully.
Please let me know if the arbitration committee needs any additional information on the analysis.
Thanks,
Jess Calciano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/iwg-arbitration-committee/attachments/20150722/f035c89b/attachment.html>
More information about the iwg-arbitration-committee
mailing list