[ewg] RE: [ofw] SC'09 BOF - Meeting notes

Sujal Das sujal at mellanox.com
Thu Nov 19 12:34:18 PST 2009


Please see come comments below marked as [Sujal] related to the
acceptance of motions related to RoCEE at the BOD meeting:

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org
[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Woodruff, Robert
J
Sent: Thursday, November 19, 2009 11:31 AM
To: Richard Frank
Cc: ofw at lists.openfabrics.org; OpenFabrics EWG
Subject: [ewg] RE: [ofw] SC'09 BOF - Meeting notes


The arguments against including it are:

1.) We have agreed in the EWG to follow a process where code that is to
be included in OFED be
first reviewed and accepted, or at least queued for acceptance in a
future kernel.
So far, since the spec is not yet done, Roland has expressed concerns
about the current
implementation and how the final spec may require changes to the
implementation, and
as such, does not want to push something upstream, only to have to make
changes later
that could impact people that have started to use the early experimental
version.

[Sujal] It was disclosed at the BOD meeting that there is no defined
process for inclusion of new features in OFED releases, rather it is
based on discussions and consensus that happen in EWG meetings.  This
was the basis for acceptance of the modifications to the motion at BOD
and  the subsequent voting and acceptance (14 voted in favor, 2 opposed)

2.) We have also discussed in the past that part of the problems with
OFED being able
to meet its committed released dates are because we have in the past
allowed major
changes into the release way after feature freeze. We have discussed
that this is not
the way we should be working. So, since OFED-1.5 is already at RC2, I
think it is
too late to add such a major change.  

3.) Since there is a complete branch version of OFED-1.5 that includes
the RoCCE patches,
people that want to try this experimental branch can download that tar
ball and use it.
It is also possible for Mellanox to include the feature in their release
to support their
current customers. I would rather see this kept as an experimental
branch for a while
and allow people to get some air time and testing on it before seeing it
go into the main
code base. We have to be more and more careful now with the OFED code
base as lots of
people are using it in production and we have to be very careful not to
de-stabilize the
code.

[Sujal] Once again there is no defined and accepted process in the EWG
about air time etc, and EWG needs to work on implementing the
instructions from the BOD as best as it can using current practices -
which is discussions and consensus within EWG and respecting the
overwhelming number of BOD members who expressed strong interest to have
the technology be part of OFED (and WinOF).

my 2 cents on this one, but it is up to the full EWG members to discuss
the options and make the final
decision.

woody


-----Original Message-----
From: Richard Frank [mailto:richard.frank at oracle.com] 
Sent: Thursday, November 19, 2009 10:59 AM
To: Woodruff, Robert J
Cc: OpenFabrics EWG; ofw at lists.openfabrics.org
Subject: Re: [ofw] SC'09 BOF - Meeting notes

How can 1500 lines out of 240k lines be a big change.. do I have these 
numbers right - is the
big change you are referring too?

What is the risk area that you are worried about .. do you think it will

break current
transports or existing ULPs ?

If it's just about how the implementation is done.. can this be resolved

concurrently with
getting the bits available for evaluation now..

As RoCEE is totally transparent to existing ULPs.. any potential changes

would not be
visible.. and therefore not an issue for ULP / clients going forward.. 
right ?

Oracle would like to see RoCEE get into 1.5.

We are testing with RoCEE now and plan to deploy it fairly soon.. in 
very large configurations...
so we'd like to see other folks pick it up and try it out.. ASAP... to 
allow for time to get
fixes into a 1.5.x release..

It would be great if RoCEE were part of 1.5 even if it were listed as 
"evaluation"..
for now.


Woodruff, Robert J wrote:
> Hmmm - the original mail I sent did not seem to show up
> on the list. Maybe the spam filters caught it because of the
> attachment. Re-sending without the attachment. If anyone wants
> a copy of the final slides, let me know and I can send them
> directly. Below is the notes from the BOF.
>
> woody
>
>
> -----Original Message-----
> From: Woodruff, Robert J 
> Sent: Thursday, November 19, 2009 10:16 AM
> To: Woodruff, Robert J; Tziporet Koren; Gilad Shainer; Yiftah Shahar;
Betsy Zeller; Smith, Stan; HalRosenstock; Jeff Squyres; DKPanda;
pgrun at systemfabricsworks.com
> Cc: hrap at us.ibm.com; bboas at systemfabricworks.com;
pgrun at systemfabricsworks.com; rpearson at sxystemfabricsworks.com;
OpenFabrics EWG; ofw at lists.openfabrics.org
> Subject: SC'09 BOF - Meeting notes and Final Slides
>
> Here are just a few notes from the OFA BOF at SC'09. 
> Stan also took a few notes and can add any additional comments
> if I missed anything in these notes. 
>
> We had some discussion about the new RDMAoE support and
> if we should try to get it into OFED-1.5 or wait till a later release.
>
> Since this is such a major change and OFED-1.5 is already at RC2,
> several people expressed concern that it might be better to not
> hold up OFED-1.5 and release the RDMAoE support in a later
> release after it has been accepted upstream and tested more. We asked
for a show
> of hands and more people though it was better to wait than to 
> put the code in at this late date. This is just one data point
> for the EWG to take into consideration when deciding how and when to
incorporate the
> new code.
>
> We also discussed the possibility of dropping support for RHEL 4
> for OFED-1.6. Most people seemed to agree that if RHEL EL6 is out
> by then, that it would probably be OK to drop RHEL 4, as it would
> likely then not be supported anymore by Redhat.  No one voiced a
strong
> desire to continue to support EL 4 for OFED-1.6.
>
> In the WinOF section, it was announced that Microsoft has now joined
> the Open Fabrics alliance as a voting member. Welcome aboard
Microsoft!!!!
>
> We discussed the topic of if we should continue to include the open
source
> MPIs in the OFED releases. As was the case in Sonoma, there were
people
> that expressed both arguments for keeping the MPIs in the release and
those 
> that thought we should not distribute the MPIs. I don't think there is
a 
> consensus either way on this one. 
>
> On the topic of scalability and possible future enhancements for
scalability,
> one person asked for verbs extensions to allow asynchronous QP create
and
> modify calls.  As for the rest of the proposed scalability
enhancements, 
> most people agreed that there are scalability issues with the RDMA CM
> and the SA, so work definitely needs to be done in this area. There
was not
> too much discussion on the other suggestions that Hal had sent in, but
> scalability should be a major topic area for the next 
> developer's workshop in Sonoma. 
>
> There was also some discussion on the new collective offload that some
> of the IHVs have started to implement in hardware and that there is a
need
> for standard verbs extensions to allow common APIs that will allow
access 
> to these offloaded collectives.  Maybe this could also be a topic for
the 
> next Sonoma workshop. 
>
> On the topic of building Ethernet clusters for HPC, we ran a bit short
of
> time and so we decided to defer this topic. Maybe we can have a
session
> on Sonoma on this one as well. 
>
> Attached is the final version of the slides that were presented. 
>
> woody_______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>   
_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



More information about the ewg mailing list