[ofiwg] libfabric release v1.16.0, future schedule, and version 2.0

Hefty, Sean sean.hefty at intel.com
Fri Sep 30 16:22:06 PDT 2022


Libfabric version 1.16.0 is now available:

https://github.com/ofiwg/libfabric/releases/tag/v1.16.0

Current plans for future major releases are:

v1.17.0 - November 2022
v1.18.0 - March 2023
v1.19.0 - July 2023

Minor releases will occur as needed.

Libfabric 2.0
-------------
I wanted to start the discussion of moving to libfabric version 2.0 following the v1.19.0 release.  This is to give users of libfabric time to plan code migration.

What we want to do for a 2.0 release, including *if* we want to move that direction, needs to have community discussion and agreement.  It will be part of future OFIWG meetings, and can also be discussed in the OFA workshop early next yet.

But, of course, I wouldn't have brought up the idea if I hadn't already had some thoughts, so these are my thoughts:

2.0 should be a drop in replacement for most existing binaries that were compiled and work against version 1.X (with X to be decided).  My hope is to simplify the API by removing little used features and re-examine how narrow use features are exposed.  The first objective is to list potential features to remove, identify if there are users, and update the list accordingly.  In some cases, a proposed change may make sense, but have broad impact if backwards compatibility were not maintained.

Here's a first pass at potential features to examine, grouped by header files.  In most cases, the proposal is to see if the item can be removed.  I wanted this list to go out first via email, for broader exposure, but will also capture in github for better tracking.

fi_log.h

- remove enum fi_log_subsys
- remove FI_LOG_TRACE and FI_LOG_MAX

fi_prov.h

- Change fi_param_get_bool() to use bool instead of int

fi_trigger.h

- Re-examine if we want to keep triggered ops

fi_tagged.h

- Keep as is

fi_rma.h

- Likely keep as is
- Possible for removal is rma_iov_count

fi_ext.h

- Keep as is
- Expand peer provider APIs for greater composability

fi_errno.h

- Keep as is

fi_eq.h

- Remove wait sets from API
- remove FI_WAIT_MUTEX_COND and FI_WAIT_POLLFD
- Remove poll sets from API
- remove fi_eq_attr::signaling_vector
- remove async memory registration (FI_MR_COMPLETE)
- remove async AV insertion (FI_AV_COMPLETE)
- remove fi_cq_wait_cond
- Can we simplify, rework, or remove counters

fi_endpoint.h

- remove FI_OPT_ENDPOINT options:
  BUFFERED_MIN, BUFFERED_LIMIT, SEND_BUF_SIZE, RECV_BUF_SIZE,
  TX_SIZE, RX_SIZE
- remove fi_ops_ep::rx_size_left / tx_size_left

fi_domain.h

- remove FI_SYMMETRIC
- remove fi_ops_av::insertsym / insertsvc
- remove fi_mr_modify
- remove FI_DATATYPE_LAST, FI_ATOMIC_OP_LAST

fi_collective.h

- Keep as is

fi_cm.h

- Keep as is
 
fi_atomic.h

- remove fi_ops_atomic::write_valid / readwritevalid / compwritevalid

fabric.h

- remove FI_PATH_MAX / FI_NAME_MAX / FI_VERSION_MAX
- remove FI_PROV_SPECIFIC (document instead)
- remove FI_MORE
- remove FI_PRIORITY (make internal)
- remove FI_AFFINITY
- remove FI_MATCH_COMPLETE
- remove FI_AV_USER_ID is not implemented (keep or remove?)
- remove FI_VARIABLE_MSG
- remove FI_SOURCE_ERR
- remove FI_NUMERICHOST
- Remove address formats for removed providers
- remove FI_MR_UNSPEC / FI_MR_BASIC / FI_MR_SCALABLE
- remove FI_THREAD_FID / FI_THREAD_COMPLETION / FI_THREAD_ENDPOINT
- remove fi_resource_mgmt (set to always enabled)
- remove FI_ORDER_NONE / FI_ORDER_STRICT
- remove FI_EP_SOCK_STREAM / FI_EP_SOCK_DGRAM
- Remove protocols for removed providers
- remove FI_MSG_PREFIX
- remove FI_ASYNC_IOV
- remove FI_LOCAL_MR
- remove FI_NOTIFY_FLAGS_ONLY
- remove FI_RESTRICTED_COMP
- remove FI_BUFFERED_RECV
- remove fi_tx_attr::comp_order (no order guarantees)
- remove fi_rx_attr::comp_order (no order guarantees)
- remove fi_rx_attr::total_buffered_recv
- remove fi_ep_attr::msg_prefix_size
- fi_ep_attr::max_order_raw / war / waw _size (look at alternatives)
- fi_ep_attr:mem_tag_format - replace with max_tag_bits
- remove fi_domain_attr::resource_mgmt (always enabled)
- fi_domain_attr - move/copy some cnt fields to nic attribute
  field removal has potential to impact most apps (want compatibility)
- remove fi_domain_attr::mr_iov_limit (define to 1)
- FI_SELECTIVE_COMPLETION
- fi_info - add provider specific context
- fi_info - always require that fi_info come from libfabric
  E.g. app must use fi_allocinfo or fi_dupinfo
- remove fi_alias / FI_ALIAS
- remove FI_GETFIDFLAG / FI_SETFIDFLAG
- Examine fi_deferred_work
- remove FI_REFRESH

Finally, suggested providers to remove: bgq, gni, netdir (now part of verbs), psm, rstream, usnic.

Note that proposal to remove items is based on implementation and guesses about use, not feature coolness factor.

- Sean


More information about the ofiwg mailing list