[ofw] SC'09 BOF - Meeting notes

Woodruff, Robert J robert.j.woodruff at intel.com
Thu Nov 19 11:31:15 PST 2009


The arguments against including it are:

1.) We have agreed in the EWG to follow a process where code that is to be included in OFED be
first reviewed and accepted, or at least queued for acceptance in a future kernel.
So far, since the spec is not yet done, Roland has expressed concerns about the current
implementation and how the final spec may require changes to the implementation, and
as such, does not want to push something upstream, only to have to make changes later
that could impact people that have started to use the early experimental version.

2.) We have also discussed in the past that part of the problems with OFED being able
to meet its committed released dates are because we have in the past allowed major
changes into the release way after feature freeze. We have discussed that this is not
the way we should be working. So, since OFED-1.5 is already at RC2, I think it is
too late to add such a major change.  

3.) Since there is a complete branch version of OFED-1.5 that includes the RoCCE patches,
people that want to try this experimental branch can download that tar ball and use it.
It is also possible for Mellanox to include the feature in their release to support their
current customers. I would rather see this kept as an experimental branch for a while
and allow people to get some air time and testing on it before seeing it go into the main
code base. We have to be more and more careful now with the OFED code base as lots of
people are using it in production and we have to be very careful not to de-stabilize the
code.

my 2 cents on this one, but it is up to the full EWG members to discuss the options and make the final
decision.

woody


-----Original Message-----
From: Richard Frank [mailto:richard.frank at oracle.com] 
Sent: Thursday, November 19, 2009 10:59 AM
To: Woodruff, Robert J
Cc: OpenFabrics EWG; ofw at lists.openfabrics.org
Subject: Re: [ofw] SC'09 BOF - Meeting notes

How can 1500 lines out of 240k lines be a big change.. do I have these 
numbers right - is the
big change you are referring too?

What is the risk area that you are worried about .. do you think it will 
break current
transports or existing ULPs ?

If it's just about how the implementation is done.. can this be resolved 
concurrently with
getting the bits available for evaluation now..

As RoCEE is totally transparent to existing ULPs.. any potential changes 
would not be
visible.. and therefore not an issue for ULP / clients going forward.. 
right ?

Oracle would like to see RoCEE get into 1.5.

We are testing with RoCEE now and plan to deploy it fairly soon.. in 
very large configurations...
so we'd like to see other folks pick it up and try it out.. ASAP... to 
allow for time to get
fixes into a 1.5.x release..

It would be great if RoCEE were part of 1.5 even if it were listed as 
"evaluation"..
for now.


Woodruff, Robert J wrote:
> Hmmm - the original mail I sent did not seem to show up
> on the list. Maybe the spam filters caught it because of the
> attachment. Re-sending without the attachment. If anyone wants
> a copy of the final slides, let me know and I can send them
> directly. Below is the notes from the BOF.
>
> woody
>
>
> -----Original Message-----
> From: Woodruff, Robert J 
> Sent: Thursday, November 19, 2009 10:16 AM
> To: Woodruff, Robert J; Tziporet Koren; Gilad Shainer; Yiftah Shahar; Betsy Zeller; Smith, Stan; HalRosenstock; Jeff Squyres; DKPanda; pgrun at systemfabricsworks.com
> Cc: hrap at us.ibm.com; bboas at systemfabricworks.com; pgrun at systemfabricsworks.com; rpearson at sxystemfabricsworks.com; OpenFabrics EWG; ofw at lists.openfabrics.org
> Subject: SC'09 BOF - Meeting notes and Final Slides
>
> Here are just a few notes from the OFA BOF at SC'09. 
> Stan also took a few notes and can add any additional comments
> if I missed anything in these notes. 
>
> We had some discussion about the new RDMAoE support and
> if we should try to get it into OFED-1.5 or wait till a later release.
>
> Since this is such a major change and OFED-1.5 is already at RC2,
> several people expressed concern that it might be better to not
> hold up OFED-1.5 and release the RDMAoE support in a later
> release after it has been accepted upstream and tested more. We asked for a show
> of hands and more people though it was better to wait than to 
> put the code in at this late date. This is just one data point
> for the EWG to take into consideration when deciding how and when to incorporate the
> new code.
>
> We also discussed the possibility of dropping support for RHEL 4
> for OFED-1.6. Most people seemed to agree that if RHEL EL6 is out
> by then, that it would probably be OK to drop RHEL 4, as it would
> likely then not be supported anymore by Redhat.  No one voiced a strong
> desire to continue to support EL 4 for OFED-1.6.
>
> In the WinOF section, it was announced that Microsoft has now joined
> the Open Fabrics alliance as a voting member. Welcome aboard Microsoft!!!!
>
> We discussed the topic of if we should continue to include the open source
> MPIs in the OFED releases. As was the case in Sonoma, there were people
> that expressed both arguments for keeping the MPIs in the release and those 
> that thought we should not distribute the MPIs. I don't think there is a 
> consensus either way on this one. 
>
> On the topic of scalability and possible future enhancements for scalability,
> one person asked for verbs extensions to allow asynchronous QP create and
> modify calls.  As for the rest of the proposed scalability enhancements, 
> most people agreed that there are scalability issues with the RDMA CM
> and the SA, so work definitely needs to be done in this area. There was not
> too much discussion on the other suggestions that Hal had sent in, but
> scalability should be a major topic area for the next 
> developer's workshop in Sonoma. 
>
> There was also some discussion on the new collective offload that some
> of the IHVs have started to implement in hardware and that there is a need
> for standard verbs extensions to allow common APIs that will allow access 
> to these offloaded collectives.  Maybe this could also be a topic for the 
> next Sonoma workshop. 
>
> On the topic of building Ethernet clusters for HPC, we ran a bit short of
> time and so we decided to defer this topic. Maybe we can have a session
> on Sonoma on this one as well. 
>
> Attached is the final version of the slides that were presented. 
>
> woody_______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>   



More information about the ofw mailing list