[Iwg-arbitration-committee] Updated arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)
Dave Wyman
dwyman at iol.unh.edu
Wed Jul 22 13:32:28 PDT 2015
Great, thanks Jess
-Dave
On 7/22/15 4:23 PM, Calciano, Jess wrote:
>
> Hello Dave,
>
> Both reports look good and are approved to publish to the Logo List.
>
> -- Jess
>
> *From:*David Wyman [mailto:dwyman at iol.unh.edu]
> *Sent:* Wednesday, July 22, 2015 12:08 PM
> *To:* Calciano, Jess
> *Cc:* iwg-arbitration-committee at openfabrics.org; OFA Lab Mailing List;
> Cole, Cliff
> *Subject:* Re: Updated arbitration request for Intel QLE7340 & QLE7342
> HCAs (Jan 2015 OFA Interop Logo Event)
>
> Hello Jess,
>
> Your arbitration request regarding the NFSoRDMA and RSockets issues
> found in the January 2015 Interop Logo Event has been approved and an
> updated report for Intel Infiniband HCAs is attached. I also attached
> the Intel Switch Report. Please review both and if you are satisfied
> with the content respond with explicit consent to post these reports
> to the OpenFabrics Interoperability Logo List. Please contact me with
> any questions or concerns.
>
> Thank you,
>
> Dave Wyman
>
> UNH-IOL OpenFabrics Interoperability Logo Group
>
> On Wed, Jun 3, 2015 at 3:10 PM, Calciano, Jess
> <jess.calciano at intel.com <mailto:jess.calciano at intel.com>> wrote:
>
> Hello,
>
> Since the original arbitration request was submitted, there’s been
> some further discussion about the RSockets failure. With the fix
> for librdmacm described in the original request, rstream ran
> successfully for most message sizes, but still hung with -S 1024.
>
> Additional investigation traced the new problem to an
> incompatibility between the qib driver and the ibv_create_qp()
> function. A workaround (described below) is available for the
> current OFED version and a permanent fix to librdmacm will be
> included in the next OFED 3.18 release.
>
> Details:
>
> The ultimate issue is still related to the qib driver being
> non-compliant with the ibv_create_qp() definition:
>
> The function ibv_create_qp() will update the
> qp_init_attr->cap
> struct with the actual QP values of the QP that was
> created;
> *** the values will be greater than or equal to the
> values requested. ***
>
> Specifically, the qib driver will return an inline size that is
> smaller than that requested. Rsockets has code to trap for this,
> but the rsockets code looks like this:
>
> inline_size = SOME_DEFAULT_LIKE_64
> rs_init_bufs(...);
> ...
> rs_create_qp(...);
> inline_size = qp_cap->max_inline_size;
>
> The issue is that rs_init_bufs(), which allocates the buffers and
> registers the memory, uses the default inline size. The net
> result is that rsockets ends up referencing memory that is outside
> of the registered memory region when sending credit updates. The
> lost credit update is causing the hang that you see.
>
> A quick check shows that I can move the rs_init_bufs() call after
> the qp has been created and have the test work. You should also
> be able to override the inline_size by writing the value 0 into a
> config file. This will set the inline_size to 0 as the default.
> To do this, you need to write a 0 into
> /etc/rdma/rsocket/inline_default. (The actual path will depend on
> your configuration, so it could be under /usr/etc/rdma/... for
> example.) Updating the config file should work with the current
> version.
>
> I will provide an update to the librdmacm to handle this. That
> update will find its way into the 3.18 release.
>
> Thanks,
>
> Jess Calciano
>
> *From:*Calciano, Jess
> *Sent:* Wednesday, April 08, 2015 2:39 PM
> *To:* iwg-arbitration-committee at openfabrics.org
> <mailto:iwg-arbitration-committee at openfabrics.org>
> *Cc:* OFA Lab Mailing List; Dave Wyman; Rupert Dance
> <rsdance at soft-forge.com <mailto:rsdance at soft-forge.com>>
> (rsdance at soft-forge.com <mailto:rsdance at soft-forge.com>); Cole,
> Cliff; Mascarenhas, Edward; Sharma, Karun; Thete, Swapna; Hefty,
> Sean; Yan, Philip W; Flores, Jose F
> *Subject:* Arbitration request for Intel QLE7340 & QLE7342 HCAs
> (Jan 2015 OFA Interop Logo Event)
>
> Hello,
>
> Intel would like to file an arbitration request for the January
> 2015 OFA Interop Logo Event results for the Intel QLE7340 and
> QLE7342 HCAs.
>
> The provided report (attached for reference) shows two failing tests:
>
> 1)TI NFS over RDMA
>
> 2)TI RSockets
>
> The Intel team has investigated these results and determined that
> the failures are due to bugs in non-Intel components.
>
> NFSoRDMA:
>
> The failure is due to a known Connectathon issue, documented here:
>
> http://www.spinics.net/lists/linux-nfs/msg16460.html
>
> RSockets:
>
> The issue is that ibv_modify_qp() is failing. The problem is that
> an incorrect bit is set in the qp_attr_mask, which is returned
> from the kernel. With Intel, bit 21 of the qp_attr_mask is set.
> This is not the case for a Mellanox HCA.
>
> Bit 21 is not defined for userspace. However, it was defined in
> the kernel as IB_QP_SMAC.
>
> If the librdmacm is modified to mask out this bit, the call
> succeeds and rstream runs successfully.
>
> Please let me know if the arbitration committee needs any
> additional information on the analysis.
>
> Thanks,
>
> Jess Calciano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/iwg-arbitration-committee/attachments/20150722/67625175/attachment.html>
More information about the iwg-arbitration-committee
mailing list