[Ofmfwg] Sunfish meeting agenda for 17 January 2025

Aguilar, Michael James mjaguil at sandia.gov
Wed Jan 15 10:38:56 PST 2025



From: Aguilar, Michael James <mjaguil at sandia.gov>
Date: Wednesday, January 15, 2025 at 11:37 AM
To: Phil Cayton <phil.cayton at intel.com>, Herrell, Russ W (Senior System Architect) <russ.herrell at hpe.com>, Hanford, Nathan Patrick <hanford1 at llnl.gov>, Hobbs, William <hobbs17 at llnl.gov>, Corwell, Sophia E. <secorwe at sandia.gov>, Appleby, Catherine <caapple at sandia.gov>, oguchi.naoki at jp.fujitsu.com <oguchi.naoki at jp.fujitsu.com>, Huai-Yang Pan <huaiyangpan at gmail.com>
Subject: Sunfish meeting agenda for 17 January 2025
Everyone

Here is a tentative meeting agenda for 17 January 2025.

In the last week, we’ve started documenting the necessary steps to install the Sunfish .whl library, using Python-3.9, and the steps to load the library into the Sunfish Server, as it stands.  In addition, we’ve started platform work on Flux and we discussed approaches to integrating Sunfish with Flux, using the Flux Jobtapp, Prolog, and Epilog infrastructure.


  1.  Brian Pan from H3 will be demonstrating new work on their CXL hardware and the Sunfish Agent.

     *   ‘ We are currently developing a new build scheduled for release in March. Attached you'll find a detailed description of the solution.’

  1.  Flux/Sunfish integration to provide extended CDI capabilities to the Workload Manager

     *   Jobtapp C code framework for Flux allocations---hwloc and event subscriptions from Sunfish

  1.  flux batch -N <# of nodes> <script path and name>
  2.  flux alloc -N <# of nodes>
  3.  flux batch -t <time limit>
  4.  flux batch -T <thread limit>
  5.  flux batch -g <number of GPUs per task>
  6.  flux batch -c <number of cores per task>
  7.  flux batch --setattr=system.bank=bank
  8.  flux batch --job-name=job_name
  9.  flux batch -x ;# exclusive nodes>
  10. flux batch --requires=<.properties: a,b & hostlist | ranks)   ;# Constraints
  11. flux batch --urgency=N
  12. flux batch -o <cpu-affinity, gpu-affinity, verbose pmi, stage-in, hwloc.restrict, hwloc.xmlfile, output.mode>
  13. flux run -N 200 hostname
  14. sudo flux queue start --all


     *   Flux Prolog framework
     *   Flux Epilog framework
     *   Flux Resource List

  1.  Building out the documentation (working on this on Mondays)
  2.  Kubernetes framework---first stab (this is after Flux, obviously)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofmfwg/attachments/20250115/db9e0140/attachment.htm>


More information about the Ofmfwg mailing list