[ofiwg] ofi_hmem_ops question

Ingerson, Alexia alexia.ingerson at intel.com
Thu Nov 3 10:19:13 PDT 2022


dest/src/ptr can be host or device. Dest and src are used to indicate direction. Ptr is when there is no direction.
dev_buf is to indicate that the ptr must be a ptr to device memory (like when you get an IPC handle - it cannot be host memory). In some APIs (ZE, CUDA) this must be a base address. I am not sure if this is the same for all APIs which I believe is why the naming is more generic.
ipc_ptr is the returned ptr (to device memory) that has been opened by a peer process for an IPC copy. It can be used like a regular ptr, but it is named ipc_ptr to show that the process using it did not allocate it/does not own it

Addr is pretty much the same as ptr so that can be renamed to align.
We could align the usage of buf/ptr to match for ptr/addr/dev_buf

-----Original Message-----
From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Hefty, Sean
Sent: Thursday, November 3, 2022 9:52 AM
To: ofiwg at lists.openfabrics.org
Subject: [ofiwg] ofi_hmem_ops question

For reference, these are the abstractions to support the half dozen device APIs known to libfabric:

struct ofi_hmem_ops {
	bool initialized;
	int (*init)(void);
	int (*cleanup)(void);
	int (*copy_to_hmem)(uint64_t device, void *dest, const void *src,
			    size_t size);
	int (*copy_from_hmem)(uint64_t device, void *dest, const void *src,
			      size_t size);
	bool (*is_addr_valid)(const void *addr, uint64_t *device, uint64_t *flags);
	int (*get_handle)(void *dev_buf, void **handle);
	int (*open_handle)(void **handle, uint64_t device, void **ipc_ptr);
	int (*close_handle)(void *ipc_ptr);
	int (*host_register)(void *ptr, size_t size);
	int (*host_unregister)(void *ptr);
	int (*get_base_addr)(const void *ptr, void **base, size_t *size);
	bool (*is_ipc_enabled)(void);
	int (*get_ipc_handle_size)(size_t *size); };

Can someone tell me the difference between dest, src, addr, dev_buf, ipc_ptr, ptr, and base parameters?

Dest, src, and addr appear to be virtual addresses *within* some allocated memory region.  The region may be allocated on either the host or device.  I get the name difference here.

Discussions in https://github.com/ofiwg/libfabric/pull/8199 indicate the same may not be true for the other parameters, and it's unclear to me if the same type of parameter is going by different names based on which call is invoked.  E.g. do base and dev_buf reference the same memory?

- Sean


_______________________________________________
ofiwg mailing list
ofiwg at lists.openfabrics.org
https://lists.openfabrics.org/mailman/listinfo/ofiwg


More information about the ofiwg mailing list