[libfabric-users] fi_read questions
arnestruck at astruck.de
Thu Oct 15 14:35:04 PDT 2020
>> The address (addr parameter) passed into fi_read() is the offset into the peer's
>> buffer. The address may either be 0-based (the default), or based on the virtual
>> address that the peer uses to access the memory. In the latter case, the
>> FI_MR_VIRT_ADDR mr_mode bit will be set.
>> I assumed that the mr_modes are set by user interaction. Since I dont set
>> FI_MR_VIRT_ADDR (in fact none) so far the mr_mode should be FI_MR_SCALABLE.
>> Thus it should be a 0-based offset (which is set to 0 at my application).
> The app is supposed to set which mr_mode bits it _can_ support. The provider will clear the bits that it doesn't require.
Ok, so I should set FI_MR_VIRT_ADDR nevertheless. From what I read tpc
and sockets Provider sopport this mode
>> As before, with tcp, it can use a 0-based offset. But RDMA hardware decided that
>> having a base address start at the peer's virtual address made sense to them. So an
>> offset of 0 is indicated by specifying the peer's virtual address associated with the
>> start of the buffer.
>> In the target system there is no special RDMA hardware I know of.
>> One way to handle this is for the peer to always exchange a base address with the
>> mr key. If FI_MR_VIRT_ADDR is set, the base address is set to 0. If FI_MR_VIRT_ADDR
>> is 0, the base address should equal the virtual address of the memory buffer. The
>> process that initiates the RMA then just uses the provided key and base address that it
>> was given.
>> What I can do is:
>> Verify that the mr_mode is indeed set to FI_MR_SCALABLE or another mr_mode.
> If you're using libfabric v1.9 (or anything after 1.5), think of FI_MR_SCALABLE as no longer existing. :) Use the individual mr_mode bit flags.
Will do so.
>> Send the key returned by the fi_mr_key function and not directly the generated ones.
> fi_mr_key() is always usable. Even if the app specifies the key, it will just return the app's value.
Ok, so basically 0 is a valid input and forces libfabric to generate a
value (if it doesnt do so automatically)
>> Any other idea what I could be doing wrong/check? What additional information could I
>> provide to help in the search for my misunderstanding?
> You mentioned that you were receiving errno ENOENT (no such file or directory). How were you receiving this? Error code returned from a call? As part of a completion? Which provider were you using?
It is returned from the fi_read call
Will update tomorrow, it is getting late around here. Thanks for the
help so far.
Greetings , Arne
More information about the Libfabric-users