[openib-general] Need for ONE OpenIB Release process that all members can agree to and that follows OpenIB Bylaws

Bill Boas bboas at systemfabricworks.com
Mon Feb 27 22:49:05 PST 2006


There appear to be 2 groups within OpenIB thinking about different
approaches to preparing the code for Release 1.0. One group is thinking
about downstreaming it to RedHat and Novell, another group seems to be
thinking about separate releases from some IB suppliers than others.

 

Lets remind ourselves of the purposes OpenIB was created and what all of the
member companies have just re-affirmed in the Board meeting last Friday (by
approving the re-worked By-laws). The principles are, I believe: (if there
are misstatements below, lets discus openly)

 

1)       OpenIB develops open source code creating a software stack. OpenIB
(now OpenFabrics Alliance) is a corporation with Bylaws that all members
should obey if they want the corporation to continue to function. It will
only survive if, in general, all members self interests are served
simultaneously with each member's own self interest.

2)       OpenIB members by a 2/3rds vote of the members have to approve the
content of that stack through the Proposal process described section 12 of
the Bylaws. It is not up to a single member or group of members to decide on
their own what is or is not in the OpenIB stack. This is deliberate to
prevent one or more members gaining competitive advantage through the OpenIB
stack over other members.

3)       OpenIB downstreams kernel code to kernel.org

4)       OpenIB code is distributed to end customers (like Wall St., labs,
etc) and to mid tier customers of OpenIB (Oracle, IBM, Sun, Dell, LNXI etc.)
via Linux distributions such as RedHat and Novell.

5)       End customers told the IB companies in February 2004 and in
December 2005 at Credit Suisse (HSIR meeting) that they wanted ONE OpenIB
stack that runs on every IB vendors hardware, that interoperates with all
other IB vendors h/w and s/w, is used by all mid-tier suppliers and that it
comes with their Linux distribution.

 

I realize that so far in OpenIB's evolution we have not worked out the issue
of how to support end-customers while following these principles for the
release process. But that, I suggest, is not a valid reason for breaking
these principles. We should be able to deal with "Release" as one process
and "Support" as another process - though of course there will be linkage
between them but they are not the same process. The way do ne is not
necessarily the way to do the other.

 

This email is an appeal to the two groups to work together, not to work
separately, and to work on solving these issues for the membership as a
whole, not just their own company, or a select group. Please bring to the
Board a proposal that serves all the membership.

 

Here's what one group seems to be thinking (edited to remove "I"):

 

"Here is a first cut at the set of components (protocols, drivers, userspace
bits) that we think we should be supporting in 1.0.  Please look over it and
let us know if we are missing anything.

 

HCA support (both kernel driver and userspace verbs components):

 

      * ehca

      * ipath

      * mthca

 

IB protocols:

 

      * IPoIB

      * RC

      * SDP

      * SRQ

      * UC

      * UD

 

Userland software:

 

      * libibverbs

      * libsdp

      * opensm

 

As far as we can tell, most of the rest of OpenIB userland (libibcm,
libibat, libibmad, etc) is logically part of OpenSM, can be treated as such
(I think Doug is already doing this with his Red Hat spec files) and is
unlikely to be used by other applications.  Am I way off?

 

Components that we don't know what to do about, and will likely want to drop
unless someone can vouch for them:

 

      * iSER

      * SRP

      * uDAPL"

 

Here's what the other group suggested:

 

"Openib Commercial Grade Release 1.0 release criteria


1)       CPU Architectures: 

a)       x86_32 (Xeon) 

b)       x84_64 (Nocona, Opteron) 

c)       ia64

d)       PPC64 (Power5, Power6) - Mellanox does not support these systems 

2)       Linux distributors and kernels 

a)       RH: AS EL4 up3; Fedora C4 last update , and maybe FC5

b)       SuSE: SuSE 10 last update (open - SLES10 beta)

c)       kernel.org: the latest that is available when generating rc1. In
1.0 it will probably be 2.6.16 (might be 2.6.17).

3)       Packaging and installation

a)       The openib release will be packages in one tarball for both kernel
and user-level.

b)       One install script will support full installation. The install will
support typical and custom components

I will send a different document with install definition to be reviewed and
agreed between all.

4)       HCA and Switch Support:

a)       HCAs: InfiniHost, InfiniHost III Ex (both modes: with memory and
MemFree), InfiniHost III Lx 

b)       Switches: Need to support all vendors' production switches - each
vendor should send the list. 

5)       Switch Management Interoperability testing 

a)       Follow the CIWG-OpenIB HCA-OEM Switch Interop Test Plan

6)       Feature set per ULP: 

a)       Will be defined later with each ULP maintainer. 

7)       Minimum cluster size to be tested 

a)       Need at least 128 nodes cluster, bigger is better. 

8)       Scalability requirements 

a)       SM: 

i)         Bringup a subnet with 1,000 nodes in 2 minutes

ii)       SM should not be a bottle neck in any application running (IPoIB)

b)       MPI:

i)         MPI runner - should be able to launch thousands of processes (say
50,000) in a bounded time manner.

ii)       Memory consumption - should be able to run many processes on the
same node (for now, 8 processes is the upper limit with the Opteron
machines), in a many node (thousands of nodes) installation.

iii)      Sending HUGE messages in collectives - MPI should not fail for
limited physical memory. 

9)       Performance requirements:
First we need to agree on the performance benchmark for each ULP:

a)       Basic verbs - performance tests in openib (send, RDMA read/write
latency & BW)

b)       IPoIB - netperf

c)       MPI - Pallas

d)       SDP - iperf

e)       SRP - iometer

f)     iSER - iometer

10)   Documentation requirements 

a)       Product brief

b)       Installation guide 

c)       User guide 

d)       Release notes 

e)       Troubleshooting

f)     Test Plan and Test Report 

11)   Storage target test requirements 

a)       Engenio target - Mellanox will be responsible of verification

b)       Cisco & SST - please add more target systems 

12)   Firmware and Hardware versions to be tested 

a)       Both DDR and SDR modes should be supported. 

b)       FW burned should be the last official released by Mellanox:

i)         InfiniHost III Lx: fw-25204-1.0.800

ii)       InfiniHost III Ex: fw-25218-5.1.400 and fw-25208-4.7.600 (both
will be released in 2 weeks)

iii)      InfiniHost: fw-23108-3.4.000

iv)      InfiniScale III - fw-47396-0.8.3

v)        InfiniScale - fw-43132-5.5.0

13)   Specifications compliance: 

a)       Verbs & management: InfiniBand Architecture Specification, Volume
1, Release 1.2 

b)       IPoIB: www.ietf.org: draft-ietf-ipoib-architecture-04 and
draft-ietf-ipoib-ip-over-infiniband-07 

c)       SDP: Annex A4" of the InfiniBand Architecture Specification, Volume
1, Release 1.2 

d)        SRP: SCSI RDMA Protocol-2 (SRP-2), Doc. no. T10/1524-D.
(www.t10.org/ftp/t10/drafts
<http://www.t10.org/ftp/t10/drafts/srp2/srp2r00a.pdf> /srp2/srp2r00a.pdf). 

e)       MPI: www.mpi-forum.org/docs/mpi-11
<http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html>
-html/mpi-report.html

f)         iSER:
<http://www.ietf.org/internet-drafts/draft-hufferd-iser-ib-01.pdf>
www.ietf.org/internet-drafts/draft-hufferd-iser-ib-01.pdf 

g)       RDS: SS can you provide info 

The following two items are very important for the SW stack QA but not
gating for starting the release process.

1)       ISV test requirements - coverage for all ULPs

2)      Database test requirements

Cisco, SS and Voltaire should define those, since they already have test
beds for commercial applications and databases."

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060227/4bdeaa06/attachment.html>


More information about the general mailing list