[ofiwg] OFIWG Meeting Minutes 6/25/2024

Xiong, Jianxin jianxin.xiong at intel.com
Tue Jun 25 15:50:01 PDT 2024


OFIWG 6/25/2024

**** Participants ****

Alexia Ingerson (Intel)
Alex MicKinley (Intel)
Ben Lynam (Cornelis)
Bob Cernohous
Charles Sherada (Cornelis)
Chien Tung (Intel)
Howard Pritchard (LANL)
Ian Ziemba (HPE)
Jerome Soumagne
Jianxin Xiong (Intel)
Juee Desai (Intel)
Peinan Zhang (Intel)
Rajalaxmi (Intel)
Shi Jin (AWS)
Stephen Oost (Intel)
Steve Welch (HPE)
Zach Dworkin (Intel)

**** Summary ****

Discussed release schedule for July: 1.22.0 feature release and 1.21.1 bug fix release. 
both targeted to begin RC process mid-July, GA target end of July (7/26). Maintainers, 
please begin cherry-picking or preparing patches for both 1.21.x and 1.22.x branches. 
2.0.0 alpha release targeted to begin RC process end of July with a GA target of early August.

New API capability/flag and function calls were proposed to allow separation of memory
pinning/registration and key assignment for RMA. Would allow the application to limit
the scope and lifecycle of keys and isolate key assignment and memory pinning as necessary. 
Concerns were raised about how these new features would interact with the newly added
auth keys as well as the mr mode FI_MR_ENDPOINT which requires enabling of MRs before
use. More discussion necessary on PR finalization to figure out how these features would
interact with each other.

**** Notes ****

** Release process **

Releases in July:

    Feature release 1.22.0:
        -  MR mode, msg size options
        -  Efa and opx, profile provider bug fixes

    Bug fix release 1.21.1
    Pre-release 2.0.0 alpha
        -  1.22.0 + API deprecations + 2.0 features

Release schedules:

    1.21.1
        -  RC1: 7/12
        -  RC 2: 7/19
        -  GA: 7/26
	
    1.22.0
        -  RC1: 7/12
        -  RC 2: 7/19
        - GA: 7/26
	
    2.0.0 alpha
        -  RC1: 7/26
        -  GA: 8/2

Clarification: RC 2 not necessary if RC 1 is ok, schedule might change depending on CI issues

** Memory registration with separate key assignment **

Two steps of memory registration:
    -  Map and pin for local access (kernel must be involved)
    -  Exporting for remote access
         -	Kernel may be involved in allocating keys
         -	User space may configure with kernel access
              -  Associate key, set permissions and possibly an ACL

Today, these steps are combined. Separating can have benefits for supported hardware:
    -  Better security with lower cost
        -  Different keys with different peers
        -  Different keys for subregion

Proposal:
    -  New flag for mr calls: FI_MR_DYNAMIC_KEY
        -  When set, MR doesn’t have key assigned on registration
            -  fi_mr_key(mr) still returns desc that can be used locally
            -  fi_mr_key(mr) returns an invalid key
        -  Also serves as capability bit
        - New calls to allocate/assign/revoke keys:
            - 	 fi_mr_alloc_keys(mr, count, keys)
            -  fi_mr_assign_key(mr, offset, len, accs, auth_key_size, auth_key, key)
            -  fi_mr_revoke_key(mr, key)

Q: How is this more secure?
Security protocol might want to limit the scope and life cycle of the key. Can change the key without
having to close MR. Much more lightweight.

Q: Can you clarify why the proposal has it as both an MR mode and capability? Why not just one?
New proposal is not for mode. Renamed to FI_DYNAMIC_MR_KEY to clarify. Just a flag passed into the
MR registration call to request setting key separately – application requests that support in capabilities.
HPE would be interested in supporting.

Q: How does it interact with auth key and FI_MR_ENDPOINT? Now you can have multiple auth keys per EP.
Need to resolve issue with auth keys. Not sure how to have them work together.

MR_ENDPOINT should be fine – association of the key should be done before EP enabling. Limitation is that
you can’t use the key without assigning it. Can delay the remote key binding. Key is that fi_mr_enable should
be able to be done without key assignment.

Will discuss auth key and MR_ENDPOINT issues more when PR comes out.

If you have anything you need for releases (1.22.0 or 1.21.1) please open PRs and cherry-pick what you need.

(Thank Alexia for taking the notes!)

-Jianxin


More information about the ofiwg mailing list