[Ofmfwg] Sunfish meeting notes for September 26, 2025

Lee, Peter peter.lee at necam.com
Fri Sep 26 10:57:41 PDT 2025



  *   Sunfish OFA webinar
     *   Discuss what we want to do in the webinar. Will do a primer of CXL and talk about disaggregated resource management as a growing problem in the AI/ML space. Then provide an overview of Sunfish and how Sunfish helps solve the disaggregated resource management using H3 as an example.
     *   OFA marketing team will create a teaser for the webinar.
  *   The Flux Development Team is asking us to provide them with our build steps so that they can go over them---Flux-Core and Flux-Sched
     *   Mike was able to get Flux running now.
     *   Mike will share the build steps and the configuration with the Flux team so they can look and provide feedback.
     *   Working now to stitch the various pieces for the near node flash use case together. If we can get the near node flash with NVMe working before SC25, we can try to also do it with CXL.
  *   Sunfish
     *   New features
        *   No new feature. Russ is still working on the update resource support.
     *   New Resource Tree entries
     *   Merge requests?
        *   No new merge request. Russ found one more directory related to H3 that he needs to confirm will work with the main branch. Once this is done, Russ can delete the irrelevant branches.
  *   SC25
     *   SC25 Prep for BoF
        *   Moved to Wednesday evening:   https://sc25.conference-program.com/presentation/?id=bof125&sess=sess475
           *   BoF now at 5:15pm on Wednesday November 19, 2025.
        *   Discuss a timeline of what we want to do during the BoF. Came up with the following timeline.
           *   AI/HPC resource management and how customers are struggling with how to manage and deploy heterogeneous infrastructure.
           *   Introduce Sunfish and what it is and what it is not.
              *   What it is: Manages disaggregated resources, maintains inventory, serves as the 'single source of truth' for the infrastructure inventory, assigns resources from pools, serves as a conduit between clients that want to use the resources and the HW/vendor-specific management tools.
              *   What it is not: Interfaces with, but does not replace HW/vendor-specific managers, Swordfish, SoNIC, OpenConfig, etc.
           *   Flux workload manager in US national labs.
           *   Near-node flash with both NVMe-oF and CXL FAM.
           *   Audience discussion on what resource pools (GPUs, CXL memory, FPGA) on disaggregate-able/assignable fabric to manage.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofmfwg/attachments/20250926/17afca60/attachment.htm>


More information about the Ofmfwg mailing list