[ofiwg] OFIWG Meeting Minutes (6/11/2024)

Xiong, Jianxin jianxin.xiong at intel.com
Tue Jun 11 10:20:58 PDT 2024


Thanks to Alexia for taking the notes!

6/11
Participants:
Alexia Ingerson (Intel)
Jianxin Xiong (Intel)
Anton Bodner (Intel)
Arlin Davis (Intel)
Ben Lynam (Cornelis)
Dmitry Durnov (Intel)
Howard Pritchard (LANL)
Jerome Soumagne
Juee Desai (Intel)
Ken Raffenetti (ANL)
Maria Garzaran (Intel)
Mike Uttormark (HPE)
Nathan Hanford
Peinan Zhang (Intel)
Rajlaxmi (Intel)
Sean Hefty (Nvidia)
Shi Jin (AWS)
Steve Welch (HPE)
Todd Rimer (Intel)
Zach (Intel)

** Summary ** 

Proposing adding deprecation phase to the 2.0 release to address the concerns from community about potential disruption and maintenance overhead with direct move to 2.0 without deprecation strategy. Need to have timeline to make sure middlewares adhere to it and have enough to time to remove features. Need to have conversations directly with middleware maintainers and discuss list of features to be deprecated and get feedback on timeline. Concerns that one year is not enough. TBD

** Notes **

Jianxin:	
Concerns from community about potential disruption with direct move to 2.0 without deprecation strategy. Middlewares/providers may not be ready immediately. There is also added overhead for concurrent 1.x and 2.x  support. Proposing adding deprecation phase - deprecate instead of removing features to keep API compatibility and then remove after a predetermined amount of time (need discussion - one year?)

Change still gets released as 2.0 - no need to maintain separate 1.x branch. Symbolic link from 2.0 to 1.0. DL providers should still work.

Shi: Seems reasonable.

Todd: What's a reasonable timeline?

Jianxin:	That's a question for middleware/apps using OFI (MPI, CCL, DAOS, etc)

Todd: Middlewares need enough time to figure out alternatives/strategy to stop using the feature. Is a year enough?

Jianxin:	Yes, need to make a timeline to make sure middlewares adhere to it. Still need to figure out concrete timeline.

Shi:  How do we approach conversations with middlewares about timeline?

Jianxin:	
Need to have conversations outside OFIWG directly with middlewares about timeline and deprecated features:

  -  Async AV insertions, remove implementation
  -  FI_AV_MAP
  -  FI_THREAD_FID, FI__THREAD_ENDPOINT, remove implementation
  -  Separate control and data progress
  -  Comp_order attributes
  -  FI_ORDER_NONE, FI_ORDER_STRICT
  -  Total_buffered_recv
  -  Fid_wait and fid_poll
  -  FI_WAIT_MUTEX_COND
  -  FI_MR_BASIC, FI_MR SCALABLE, FI_LOCAL_MR
  -  Async MR registration
  -  Binding EP to multiple CQs (maybe keeping, pushback from UEC)

  New features same as before

Roadmap update:

  -  All release can be branched off main (no 1.x main branch)
  -  Release 1.22 in July (no deprecation)
  -  Release 2.0 alpha right after 1.22 (with deprecation)
  -  Release 2.0 beta in September
  -  Release 2.0 GA in November (still fully compatible with 1.x, deprecated features removed one year later?)

Howard: one year is not very long. Will distros have 1.x and 2.x?

Jianxin:	distros will pick up 2.x, library will still have symbolic link to 1.x, need to make sure we don't break anything. Need more feedback on removal timeline


More information about the ofiwg mailing list