[Iwg-arbitration-committee] Arbitration Request for the S2A9900
Rupert Dance
rsdance at soft-forge.com
Mon Jan 30 17:06:20 PST 2012
Marty,
Thank you for submitting this arbitration request. I have send it to the
Arbitration Committee reflector so that it gets to all the members of the
committee. If you have any questions regarding the Arbitration procedure,
please refer to Section 8 of the Logo Policy Document:
http://www.iol.unh.edu/services/testing/ofa/logoprogram/ .
Thank you,
Rupert Dance
Co-Chair - OFA IWG
Software Forge, Inc.
2 Greenleaf Woods Drive #301
Portsmouth, NH 03801
Phone: 603-319-8486
Fax: 603-319-8489
www.soft-forge.com
-----Original Message-----
From: Marty Schlining [mailto:mschlining at ddn.com]
Sent: Monday, January 30, 2012 7:52 PM
To: Rupert Dance; Nickolas Wood
Cc: Victor Hendrickse; Gordon Manning; Lee McBryde; Jon Giles
Subject: Arbitration Request for the S2A9900
The S2A 9900 failed 2 tests during the recent Logo event for OFED 1.5.4.
Both failures are related to link initialization. Another concern is that
these failures were not observed or were actually solved during the Fall
2011 Interop/Debug Event.
The S2A 9900 is a robust highly scalable storage platform. It comes in 2
versions, 8 x FC-8 ports and 8 x DDR IB ports. This system is used in a
variety of applications worldwide from HPC, rich-media content development
and delivery, cloud content delivery, and banking. The S2A 9900 DDR IB
connected storage is in widespread use throughout the world as primary and
nearline storage. Customers range from commercial to government to military.
The S2A 9900 is still being sold and will require support for the next few
years. It is highly likely that one of these HCA types will be connected
directly to the S2A 9900 when using OFED 1.5.4.
The S2A9900 is deployed in the majority of the world's top 100
supercomputers.
Some other notable deployments:
Large Data Program (U.S. Naval Research Laboratory) Xbox Live Boeing Oak
Ridge National Labs Spider (World's Largest Lustre Filesystem) CEA (Tera 10
Supercomputer) NASCAR Technicolor Disney/Pixar NASA NOAA Microsoft Universal
Studios
The S2A 9900 uses "legacy" HCAs. That is, HCAs that were produced before the
adoption of the IBTA 1.2 DDR auto-negotiation (AN) specification. The HCAs
in the S2A 9900 use a Mellanox proprietary (now published) auto-negotiation
technique. This AN is embedded in the hardware of the S2A 9900's HCAs and
cannot be changed according to the engineers at Mellanox.
Link to Mellanox's Legacy DDR AN:
http://www.iol.unh.edu/services/testing/ofa/training/Mellanox_IB_DDR_Auto-ne
gotiation_Specification_1_0.pdf
1.) The S2A 9900 fails to link to DDR rates to the Mellanox FDR HCA. Link
comes up at SDR instead of DDR.
A firmware update by Mellanox to both their FDR switches and FDR HCAs solved
this issue at the fall 2011 debug event. This may have been an oversight by
Mellanox for the Logo Event. We feel that Mellanox should update their HCA
firmware to a version that incorporates the legacy DDR AN and then retested.
2.) The S2A 9900 fails to link at all to the Qlogic 7340 and Qlogic 7342
HCAs.
This is a new behavior. In the past the Qlogic HCAs at least link at SDR
rates to the S2A 9900. I performed a set of link initialization tests using
OFED 1.5.3.1 and OFED 1.5.4 to duplicate the behavior.
The default configuration of the compat_ddr_negotiate flag of the ib_qib
driver for both OFED 1.5.3.1 and OFED 1.5.4 is for that flag to be '1'.
According the OFED source code, the compat_ddr_negotiate flag should be set
to '0' to attempt pre-IBTA 1.2 auto-negotiation.
In OFED 1.5.3.1, when this flag is set to 1 (default), you have a 50/50
chance that the link between the S2A 9900 and the QLE7342 HCA will come up
at SDR. Half of the time, the link does not come up at all. Most link
failures occur when a cable is reinserted into the QLE7342 HCA. When
compat_ddr_negotiate is set to 0, the link consistently comes up at SDR even
after cable reinsertions and driver restarts.
For OFED 1.5.4, the observed behavior between the S2A 9900 and the QLE7342
HCA has changed. When compat_ddr_negotiate is set to 1 (default), the link
between the S2A 9900 and the QLE 7342 never comes up. This is what was
observed during the Logo event testing. However, when compat_ddr_negotiate
is set to 0, the link consistently comes up at SDR even after cable
reinsertions and driver restarts.
Qlogic has made changes to their switch products (12300 QDR IB switch) that
have made those switches and the S2A 9900 fully compatible. However, no
changes have ever been made to the ib_qib driver in OFED to make it fully
compatible at DDR rates with the S2A 9900. The flag 'compat_ddr_negotiate',
I believe, is intended, to negotiate properly at DDR rates with legacy HCAs,
but it does not function fully. At least, it does make a SDR link occur.
We feel that any limitations with either the QLE7342 and/or the S2A9900
should be documented. The fact that a method exists to make the QLE7342 and
the S2A 9900 link at SDR rates should put both in the "warning" category for
Logo testing. It does require the use of a special flag when loading the
ib_qib driver to get the link to consistently come up at SDR rates. That
should also be documented. It may also be considered appropriate to use this
flag when performing link initialization tests between the QLE7342 and the
S2A 9900.
Ideally, we feel Qlogic should modify their driver to fully incorporate the
pre-IBTA 1.2 AN and link at DDR rates to the S2A 9900. This would be of
great benefit to many who have attempted to use Qlogic HCAs connected
directly to legacy HCAs.
Regards,
Martin Schlining
Sr. Software Engineer
DataDirect Networks
More information about the iwg-arbitration-committee
mailing list