[libfabric-users] mmap'ed kernel memory in fi_mr_reg
Jörn Schumacher
joern.schumacher at cern.ch
Thu Nov 15 02:48:33 PST 2018
On 10/26/2018 03:14 PM, Hefty, Sean wrote:
>> I am trying to register a special memory region using fi_mr_reg. This
>> is with the verbs provider and libfabric 1.6.2:
>>
>>> fi_mr_reg(socket->domain, buf->data, buf->size, FI_SEND, 0,
>>> socket->req_key++, 0, &buf->mr, NULL)
>>
>> buf->data is a virtual address that has been mmap'ed from a kernel
>> address in a custom kernel driver. The mapping looks uses
>> remap_pfn_range:
>>
>>> vma->vm_flags |= VM_DONTEXPAND;
>>> vma->vm_flags |= VM_DONTDUMP;
>>> vma->vm_flags |= VM_LOCKED;
>>> remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size, vma-
>>> vm_page_prot)
>>
>> The call to fi_mr_reg fails with -22 (Invalid argument). If I replace
>> the buffer with another buffer allocated simply with malloc, the call
>> succeeds.
>>
>> Does anybody know why this would not work with mmap'ed memory? Is
>> there
>> a way of mmap'ing the kernel address to user space that would allow
>> the
>> memory registration?
>
> I don't know without digging into the kernel. The OFI call in this case maps directly to the verbs call. You could try posting this to the linux-rdma mailing list asking about registering mmapped memory.
I would like to share here the solution I found in case others run into
the same problem. Also I have feature request that perhaps libfabric
developers can comment on, see towards the end.
After discussion on the linux-rdma mailing list, no real solution was
found. The virtual address generated by remap_pfn_range in our driver is
incompatible with the RDMA drivers in the Linux kernel.
However, the verbs extensions in the header verbs_exp.h provide
additional functionality. Among other things this includes the
capability to register a physical address as MR instead of a virtual
one. For Mellanox drivers this is described in [1]. The call is
ibv_exp_reg_mr with the IBV_EXP_ACCESS_PHYSICAL_ADDR flag.
Using the physical address for the reg_mr call we can then transfer data
directly from our custom PCIe card to the network adapter, without any
copies. Pretty nice!
To use the ibv_exp_reg_mr call I had to slightly patch libfabric, see
[2]. Libfabric already uses the ibv_exp_reg_mr call if the verbs_exp.h
is available, so just the flag needs to be added.
As for my feature request, can this be added as official feature to
libfabric? I am not sure how many other providers could actually support
physical address memory registration, but it is a nice feature when
working with custom hardware and drivers. My patch is of course a bit
"crude" and this would need to be properly implemented.
Thanks.
Cheers,
Jörn
[1] https://community.mellanox.com/docs/DOC-2480
[2] https://github.com/joerns/libfabric/compare/v1.6.x...joerns:phys_addr_mr
More information about the Libfabric-users
mailing list