[ofiwg] Minutes for OFIWG Meeting 2/4/2025
Xiong, Jianxin
jianxin.xiong at intel.com
Thu Feb 6 14:19:49 PST 2025
2/4/2025
Participants
Alexia Ingerson (Intel)
Ben Lynam [Cornelis]
Bob Cernohous (Cornelis Networks)
Charles Shereda [Cornelis]
Jerome Soumagne (HPE)
Jianxin Xiong (Intel)
Juee Desai (Intel)
Ken Raffenetti (ANL)
Peinan Zhang (Intel)
Sai Sunku (AWS)
Shi Jin (AWS)
Stephen Oost (Intel)
Steve Welch [HPE]
Zach Dworkin (Intel)
Nikhil Nanal (Intel)
Rajalaxmi Angadi (Intel)
Executive Summary
An overview of the rearchture if of the OFI shm provider was presented. There are a few drawbacks of the existing shm provider: (1) command queue can't hold command data after processing; (2) inject pool has high overhead due to access contension; (3) response queue has to be processed in order; (4) CMA-IPC fallback is difficult to implement. With the new design, send side command isused to allow receiver to hold command data for later use; inject buffer runs in paraller with command queue to avoid pool management; return queue allows out-of-order processing; and CMA-IPC fallback implementation becomes simple because of the reusable command.
There was a brief discussion on the rx_size setting in fabtests. The intention was to limit the size of posted receive buffer to not exceed provider limitation (as defined by max_msg_size). The outcome was that it would be simpler / better to let the provider / driver to allow larger receive buffer be posted. The limit really only needs to be applied to the tx side.
Details
OFI shm provider rearchitecure overview:
<< See attached slides >>
The discussion about rx_size in fabtests origined from this PR: https://github.com/ofiwg/libfabric/pull/10720. The initial goal was to limit the size of posted buffer based on the max_msg_size setting. However, the situation becomes more complicated because the same size affect both messaging and RMA buffer and the intention is only to limit the recv buffer not RMA. The true reason behind the original PR was that on some hardware, send/recv size are limited to MTU size, but RMA size is much larger. During the discussion, it was realized that there is no real reason to not allow larger buffer being posted for recv, it is sufficient to only apply the limit to the sender side. The issue can be resolved more easily by minor change to the provider / driver.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20250206/3296013c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_shm.pptx
Type: application/vnd.openxmlformats-officedocument.presentationml.presentation
Size: 574109 bytes
Desc: new_shm.pptx
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20250206/3296013c/attachment-0001.pptx>
More information about the ofiwg
mailing list