[Iwg-arbitration-committee] Updated arbitration request for Intel QLE7340 & QLE7342 HCAs (Jan 2015 OFA Interop Logo Event)

Dave Wyman dwyman at iol.unh.edu
Wed Jul 22 13:32:28 PDT 2015


Great, thanks Jess

-Dave

On 7/22/15 4:23 PM, Calciano, Jess wrote:
>
> Hello Dave,
>
> Both reports look good and are approved to publish to the Logo List.
>
> -- Jess
>
> *From:*David Wyman [mailto:dwyman at iol.unh.edu]
> *Sent:* Wednesday, July 22, 2015 12:08 PM
> *To:* Calciano, Jess
> *Cc:* iwg-arbitration-committee at openfabrics.org; OFA Lab Mailing List; 
> Cole, Cliff
> *Subject:* Re: Updated arbitration request for Intel QLE7340 & QLE7342 
> HCAs (Jan 2015 OFA Interop Logo Event)
>
> Hello Jess,
>
> Your arbitration request regarding the NFSoRDMA and RSockets issues 
> found in the January 2015 Interop Logo Event has been approved and an 
> updated report for Intel Infiniband HCAs is attached.  I also attached 
> the Intel Switch Report.  Please review both and if you are satisfied 
> with the content respond with explicit consent to post these reports 
> to the OpenFabrics Interoperability Logo List.  Please contact me with 
> any questions or concerns.
>
> Thank you,
>
> Dave Wyman
>
> UNH-IOL OpenFabrics Interoperability Logo Group
>
> On Wed, Jun 3, 2015 at 3:10 PM, Calciano, Jess 
> <jess.calciano at intel.com <mailto:jess.calciano at intel.com>> wrote:
>
>     Hello,
>
>     Since the original arbitration request was submitted, there’s been
>     some further discussion about the RSockets failure. With the fix
>     for librdmacm described in the original request, rstream ran
>     successfully for most message sizes, but still hung with -S 1024.
>
>     Additional investigation traced the new problem to an
>     incompatibility between the qib driver and the ibv_create_qp()
>     function. A workaround (described below) is available for the
>     current OFED version and a permanent fix to librdmacm will be
>     included in the next OFED 3.18 release.
>
>     Details:
>
>     The ultimate issue is still related to the qib driver being
>     non-compliant with the ibv_create_qp() definition:
>
>                 The function ibv_create_qp() will update the
>     qp_init_attr->cap
>                 struct with the actual QP values of the QP that was
>     created;
>                 *** the values will be greater than or equal to the
>     values requested. ***
>
>     Specifically, the qib driver will return an inline size that is
>     smaller than that requested.  Rsockets has code to trap for this,
>     but the rsockets code looks like this:
>
>                 inline_size = SOME_DEFAULT_LIKE_64
>                 rs_init_bufs(...);
>                 ...
>                 rs_create_qp(...);
>                 inline_size = qp_cap->max_inline_size;
>
>     The issue is that rs_init_bufs(), which allocates the buffers and
>     registers the memory, uses the default inline size.  The net
>     result is that rsockets ends up referencing memory that is outside
>     of the registered memory region when sending credit updates. The
>     lost credit update is causing the hang that you see.
>
>     A quick check shows that I can move the rs_init_bufs() call after
>     the qp has been created and have the test work.  You should also
>     be able to override the inline_size by writing the value 0 into a
>     config file.  This will set the inline_size to 0 as the default.
>      To do this, you need to write a 0 into
>     /etc/rdma/rsocket/inline_default.  (The actual path will depend on
>     your configuration, so it could be under /usr/etc/rdma/... for
>     example.)  Updating the config file should work with the current
>     version.
>
>     I will provide an update to the librdmacm to handle this.  That
>     update will find its way into the 3.18 release.
>
>     Thanks,
>
>     Jess Calciano
>
>     *From:*Calciano, Jess
>     *Sent:* Wednesday, April 08, 2015 2:39 PM
>     *To:* iwg-arbitration-committee at openfabrics.org
>     <mailto:iwg-arbitration-committee at openfabrics.org>
>     *Cc:* OFA Lab Mailing List; Dave Wyman; Rupert Dance
>     <rsdance at soft-forge.com <mailto:rsdance at soft-forge.com>>
>     (rsdance at soft-forge.com <mailto:rsdance at soft-forge.com>); Cole,
>     Cliff; Mascarenhas, Edward; Sharma, Karun; Thete, Swapna; Hefty,
>     Sean; Yan, Philip W; Flores, Jose F
>     *Subject:* Arbitration request for Intel QLE7340 & QLE7342 HCAs
>     (Jan 2015 OFA Interop Logo Event)
>
>     Hello,
>
>     Intel would like to file an arbitration request for the January
>     2015 OFA Interop Logo Event results for the Intel QLE7340 and
>     QLE7342 HCAs.
>
>     The provided report (attached for reference) shows two failing tests:
>
>     1)TI NFS over RDMA
>
>     2)TI RSockets
>
>     The Intel team has investigated these results and determined that
>     the failures are due to bugs in non-Intel components.
>
>     NFSoRDMA:
>
>     The failure is due to a known Connectathon issue, documented here:
>
>     http://www.spinics.net/lists/linux-nfs/msg16460.html
>
>     RSockets:
>
>     The issue is that ibv_modify_qp() is failing. The problem is that
>     an incorrect bit is set in the qp_attr_mask, which is returned
>     from the kernel.  With Intel, bit 21 of the qp_attr_mask is set. 
>     This is not the case for a Mellanox HCA.
>
>     Bit 21 is not defined for userspace. However, it was defined in
>     the kernel as IB_QP_SMAC.
>
>     If the librdmacm is modified to mask out this bit, the call
>     succeeds and rstream runs successfully.
>
>     Please let me know if the arbitration committee needs any
>     additional information on the analysis.
>
>     Thanks,
>
>     Jess Calciano
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/iwg-arbitration-committee/attachments/20150722/67625175/attachment.html>


More information about the iwg-arbitration-committee mailing list