[Iwg-arbitration-committee] Updated arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)

Calciano, Jess jess.calciano at intel.com
Wed Jun 3 12:10:48 PDT 2015


Hello,

Since the original arbitration request was submitted, there's been some further discussion about the RSockets failure. With the fix for librdmacm described in the original request, rstream ran successfully for most message sizes, but still hung with -S 1024.

Additional investigation traced the new problem to an incompatibility between the qib driver and the ibv_create_qp() function. A workaround (described below) is available for the current OFED version and a permanent fix to librdmacm will be included in the next OFED 3.18 release.

Details:

The ultimate issue is still related to the qib driver being non-compliant with the ibv_create_qp() definition:

            The function ibv_create_qp() will update the qp_init_attr->cap
            struct with the actual QP values of the QP that was created;
            *** the values will be greater than or equal to the values requested. ***

Specifically, the qib driver will return an inline size that is smaller than that requested.  Rsockets has code to trap for this, but the rsockets code looks like this:

            inline_size = SOME_DEFAULT_LIKE_64
            rs_init_bufs(...);
            ...
            rs_create_qp(...);
            inline_size = qp_cap->max_inline_size;

The issue is that rs_init_bufs(), which allocates the buffers and registers the memory, uses the default inline size.  The net result is that rsockets ends up referencing memory that is outside of the registered memory region when sending credit updates.  The lost credit update is causing the hang that you see.

A quick check shows that I can move the rs_init_bufs() call after the qp has been created and have the test work.  You should also be able to override the inline_size by writing the value 0 into a config file.  This will set the inline_size to 0 as the default.  To do this, you need to write a 0 into /etc/rdma/rsocket/inline_default.  (The actual path will depend on your configuration, so it could be under /usr/etc/rdma/... for example.)  Updating the config file should work with the current version.

I will provide an update to the librdmacm to handle this.  That update will find its way into the 3.18 release.

Thanks,
Jess Calciano



From: Calciano, Jess
Sent: Wednesday, April 08, 2015 2:39 PM
To: iwg-arbitration-committee at openfabrics.org
Cc: OFA Lab Mailing List; Dave Wyman; Rupert Dance <rsdance at soft-forge.com> (rsdance at soft-forge.com); Cole, Cliff; Mascarenhas, Edward; Sharma, Karun; Thete, Swapna; Hefty, Sean; Yan, Philip W; Flores, Jose F
Subject: Arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)

Hello,

Intel would like to file an arbitration request for the January 2015 OFA Interop Logo Event results for the Intel QLE7340 and QLE7342 HCAs.

The provided report (attached for reference) shows two failing tests:

1)      TI NFS over RDMA

2)      TI RSockets

The Intel team has investigated these results and determined that the failures are due to bugs in non-Intel components.

NFSoRDMA:
The failure is due to a known Connectathon issue, documented here:
http://www.spinics.net/lists/linux-nfs/msg16460.html

RSockets:
The issue is that ibv_modify_qp() is failing.  The problem is that an incorrect bit is set in the qp_attr_mask, which is returned from the kernel.  With Intel, bit 21 of the qp_attr_mask is set.  This is not the case for a Mellanox HCA.

Bit 21 is not defined for userspace.  However, it was defined in the kernel as IB_QP_SMAC.

If the librdmacm is modified to mask out this bit, the call succeeds and rstream runs successfully.

Please let me know if the arbitration committee needs any additional information on the analysis.


Thanks,
Jess Calciano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/iwg-arbitration-committee/attachments/20150603/caae6f26/attachment.html>


More information about the iwg-arbitration-committee mailing list