From newsletters at openfabrics.org Fri Apr 12 16:48:22 2024 From: newsletters at openfabrics.org (newsletters at openfabrics.org) Date: Fri, 12 Apr 2024 16:48:22 -0700 (PDT) Subject: [libfabric-users] REGISTER NOW FOR THE OPENFABRICS ALLIANCE (OFA) VIRTUAL WORKSHOP, APRIL 22-23, 2024 Message-ID: <1712965702.578838@openfabrics.org> OFA Virtual Workshop 2024 April 22-23, 2024 Join us for the 20th annual OpenFabrics Alliance (OFA) Virtual Workshop, taking place April 22-23, 2024. The Annual OFA Workshop is a premier means of fostering collaboration among those who develop fabrics, deploy fabrics, and create applications that rely on fabrics. It is the only event of its kind where fabric developers and users can discuss emerging fabric technologies, collaborate on future industry requirements, and address problems that exist today. OFA Virtual Workshop 2024 April 22-23, 2024 8:00am ?? 2:00pm PT Virtual Hosted on WebEx Registration The Workshop is hosted on WebEx. Both attendees and speakers will use the same WebEx meeting information. Registration is simple ?? Go to the [1]2024 OFA Virtual Workshop page to download the invitation. Please be respectful and courteous in this virtual environment. Contact [2]press at openfabrics.org with questions. Sessions * Keynote ?? Pavan Balaji, Meta * ??OFI 2.0 Update?? ?? Jianxin Xiong, Intel * ??Status of OpenFabrics Interfaces (OFI) Support in MPICH?? ?? Yanfei Guo, Argonne National Laboratory * ??Designing In-Network Computing Aware Reduction Collectives in MPI?? ?? Dhabaleswar Panda and Bharath Ramesh, The Ohio State University * ??High Performance & Scalable MPI library over Broadcom RoCE?? ?? Mustafa Abduljabbar, The Ohio State University; Hemal Shah, Broadcom Inc; and Shulei Xu, The Ohio State University * ??Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH?? ?? Aamir Shafi and Lang Xu, The Ohio State University * ??OFI Integrated Shared Memory Offload?? ?? Alexia Ingerson, Intel; Shi Jin, Amazon; and Amir Shehata, Oak Ridge National Laboratories * "Managing Composable Disaggregated Infrastructure With OFA Sunfish" ?? Christian Pinto, IBM Research Europe; Michael Aguilar, Sandia National Laboratories; Phil Cayton, Intel; Russ Herrell, Hewlett Packard Enterprise; and Brian Pan, H3 Platform * "An Integrated Deep Reinforcement Learning Agent for Sunfish and HPC Workload Manager Composable Disaggregated Resource Scheduling" ?? Catherine Appleby and Michael Aguilar, Sandia National Laboratories * "Cornelis Networks CN5000 Adapter and Software Update" ?? Dennis Dalessandro, Cornelis Networks * "System Composability Using CXL" ?? Kurtis Bowman, CXL Consortium MWG Co-Chair * "Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over InfiniBand" ?? Mustafa Abduljabbar and Dhabaleswar Panda, The Ohio State University * "RecoNIC: RDMA-enabled Compute Offloading on FPGA-based SmartNIC" ?? Guanwen Zhong, AMD * "Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters" ?? Hari Subramoni and Qinghua Zhou, The Ohio State University * "How to setup RDMA CI using the FSDP cluster" ?? Doug Ledford, Redhat * "How to do manual RDMA testing using the FSDP cluster" ?? Jeremy Spewock, UNH InterOperability Lab (IOL) -------------- next part -------------- An HTML attachment was scrubbed... URL: From newsletters at openfabrics.org Fri Apr 19 15:31:26 2024 From: newsletters at openfabrics.org (newsletters at openfabrics.org) Date: Fri, 19 Apr 2024 15:31:26 -0700 (PDT) Subject: [libfabric-users] (no subject) Message-ID: <1713565886.3800614@openfabrics.org> [1]OFA Virtual Workshop 2024 [2]April 22-23, 2024 The 20^th annual [3]OpenFabrics Alliance (OFA) Virtual Workshop, is taking place April 22-23, 2024. It is 8:00am-2:00pm PT each day. The [4]Annual OFA Workshop is a premier means of fostering collaboration among those who develop fabrics, deploy fabrics, and create applications that rely on fabrics. It is the only event of its kind where fabric developers and users can discuss emerging fabric technologies, collaborate on future industry requirements, and address problems that exist today. [5]OFA Virtual Workshop 2024 April 22-23, 2024 8:00am ?? 2:00pm PT Virtual Hosted on WebEx [6]Registration The Workshop is hosted on WebEx. Both attendees and speakers will use the same WebEx meeting information. The Workshop is in Pacific Time. Registration is simple ?? visit the [7]OpenFabrics Alliance Virtual Workshop 2024 website and download the [8]WebEx information. Please be respectful and courteous in this virtual environment. Contact [9]press at openfabrics.org with questions. 2024 OFA Virtual Workshop Sessions April 22-23, 2024 8:00am to 2:00pm Pacific Time Hosted on WebEx Day 1 Monday, April 22 8:00am - 2:00pm PT Opening Remarks Speaker: Phil Cayton Time: 8:00-8:05am PT Keynote Speaker: Pavan Balaji, Meta Time: 8:05-9:00am PT Session 1: "OFI 2.0 Update" Speaker: Jianxin Xiong, Intel Time: 9:00-9:30am PT 15-minute break Time: 9:30-9:45am PT Session 2: "Status of OpenFabrics Interfaces (OFI) Support in MPICH" Speaker: Yanfei Guo, Argonne National Laboratory Time: 9:45-10:15am PT Session 3: "Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters" Speakers: Hari Subramoni and Qinghua Zhou, The Ohio State University Time: 10:15-10:45am PT 15-minute break Time: 10:45-11:00am PT Session 4: "High Performance & Scalable MPI library over Broadcom RoCE" Speakers: Mustafa Abduljabbar, The Ohio State University; Hemal Shah, Broadcom Inc; and Shulei Xu, The Ohio State University Time: 11:00-11:30am PT Lunch and email break 11:30am - 1:00pm PT Session 5: "Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH" Speakers: Aamir Shafi and Lang Xu, The Ohio State University Time: 1:00-1:30pm PT Session 6: "OFI Integrated Shared Memory Offload" Speakers: Alexia Ingerson, Intel; Shi Jin, Amazon; and Amir Shehata, Oak Ridge National Laboratories Time: 1:30-2:00pm PT 2:00pm PT - End of Day 1 Day 2 Tuesday, April 23 8:00am-2:00pm PT Session 7: "Managing Composable Disaggregated Infrastructure With OFA Sunfish" Speakers: Christian Pinto, IBM Research Europe; Michael Aguilar, Sandia National Laboratories; Phil Cayton, Intel; Russ Herrell, Hewlett Packard Enterprise; and Brian Pan, H3 Platform Time: 8:00-8:30am PT Session 8: "An Integrated Deep Reinforcement Learning Agent for Sunfish and HPC Workload Manager Composable Disaggregated Resource Scheduling" Speakers: Catherine Appleby and Michael Aguilar, Sandia National Laboratories Time: 8:30-9:00am PT Session 9: "Cornelis Networks CN5000 Adapter and Software Update" Speaker: Dennis Dalessandro, Cornelis Networks Time: 9:00-9:30am PT 15-minute break Time: 9:30-9:45am PT Session 10: "System Composability Using CXL" Speaker: Kurtis Bowman, CXL Consortium MWG Co-Chair Time: 9:45-10:15am PT Session 11: "Optimized All-to-all Connection Establishment for High-Performance MPI Libraries over InfiniBand" Speaker: Mustafa Abduljabbar and Dhabaleswar Panda, The Ohio State University Time: 10:15-10:45am PT Session 12: "RecoNIC: RDMA-enabled Compute Offloading on FPGA-based SmartNIC" Speaker: Guanwen Zhong, AMD Time: 10:45-11:15am PT Session 13: "Designing In-Network Computing Aware Reduction Collectives in MPI" Speakers: Dhabaleswar Panda and Bharath Ramesh, The Ohio State University Time: 11:15-11:45am PT Lunch and email break 11:45am-1:00pm PT Tutorial: "How to setup RDMA CI using the FSDP cluster" and "How to do manual RDMA testing using the FSDP cluster" Speakers: Doug Ledford, Redhat and Jeremy Spewock, UNH InterOperability Lab (IOL) Time: 1:00-1:55pm PT Closing remarks Speaker: Doug Ledford Time: 1:55-2:00pm PT 2:00pm PT - End of Workshop -------------- next part -------------- An HTML attachment was scrubbed... URL: From Niyaz.Murshed at arm.com Tue Apr 30 08:16:05 2024 From: Niyaz.Murshed at arm.com (Niyaz Murshed) Date: Tue, 30 Apr 2024 15:16:05 +0000 Subject: [libfabric-users] Libfabric questions Message-ID: Hello all , I am trying to learn libfabric and have some basic questions. Please point me in the right direction. I am running on ubuntu 22.04 with Mellanox ConnectX 7 NIC cards on an Arm platform (N1). Question 1: Is fi_mlx provider only available on x86? Steps : Installed UCX (as required in fi_mlx. https://ofiwg.github.io/libfabric/v1.8.0/man/fi_mlx.7.html) # Library version: 1.18.0 Install libfabric As per https://www.intel.com/content/www/us/en/developer/articles/technical/mpi-library-2019-over-libfabric.html this is only available for infiniband devices. However, in one of the presentations, it shows mlx also supports RoCE (https://ibb.co/bPhm0dp) Could you please confirm if MLX provider will work on RoCE ? Other than UCX, do we need to install anything else to enable MLX provider? In my current setup, I only see Verbs provider with doman : mlx provider: verbs fabric: IB-0xfe80000000000000 domain: mlx5_0 version: 121.0 type: FI_EP_MSG protocol: FI_PROTO_RDMA_CM_IB_RC Is Verbs->mlx equivalent to verbs api wrapper over mlx provider? Question 2: Is PSM3 only available for intel NIC cards and only on x86 platform? I have the same installation on x86 and Arm , however, I see on x86 platform, the intel NICs has the following enabled provider: psm3 fabric: TCP-10.118.91.0/24 domain: eno1 version: 306.10 type: FI_EP_RDM protocol: FI_PROTO_PSMX3 The same is not present on Arm platform. Question 3: I am able to run fabtests , specially focusing on fi_rma_bw . However, even though I jave verbs(mlx) , this does not get selected. It is always selected as below : fi_fabric_attr: name: IB-0xfe80000000000000 prov_name: verbs;ofi_rxd prov_version: 121.0 api_version: 1.20 If I disable rxd in libfabric, then the application errors out at fi_getinfo. Question 4: My goal is to run rdma test between 2 application (RoCE). I have 2 nics on the server, one on numa0 and another on numa1. I have a loop cable from nic1 to nic2. How can I make sure that the data is transferred via the loop cable? Can this be done with fi_rma_bw app? Regards, Niyaz IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: