[libfabric-users] mmap'ed kernel memory in fi_mr_reg

Jörn Schumacher joern.schumacher at cern.ch
Thu Nov 15 02:48:33 PST 2018


On 10/26/2018 03:14 PM, Hefty, Sean wrote:
>> I am trying to register a special memory region using fi_mr_reg. This
>> is with the verbs provider and libfabric 1.6.2:
>>
>>>   fi_mr_reg(socket->domain, buf->data, buf->size, FI_SEND, 0,
>>> socket->req_key++, 0, &buf->mr, NULL)
>>
>> buf->data is a virtual address that has been mmap'ed from a kernel
>> address in a custom kernel driver. The mapping looks uses
>> remap_pfn_range:
>>
>>>    vma->vm_flags |= VM_DONTEXPAND;
>>>    vma->vm_flags |= VM_DONTDUMP;
>>>    vma->vm_flags |= VM_LOCKED;
>>>    remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size, vma-
>>> vm_page_prot)
>>
>> The call to fi_mr_reg fails with -22 (Invalid argument). If I replace
>> the buffer with another buffer allocated simply with malloc, the call
>> succeeds.
>>
>> Does anybody know why this would not work with mmap'ed memory? Is
>> there
>> a way of mmap'ing the kernel address to user space that would allow
>> the
>> memory registration?
> 
> I don't know without digging into the kernel.  The OFI call in this case maps directly to the verbs call.  You could try posting this to the linux-rdma mailing list asking about registering mmapped memory.

I would like to share here the solution I found in case others run into 
the same problem. Also I have feature request that perhaps libfabric 
developers can comment on, see towards the end.

After discussion on the linux-rdma mailing list, no real solution was 
found. The virtual address generated by remap_pfn_range in our driver is 
incompatible with the RDMA drivers in the Linux kernel.

However, the verbs extensions in the header verbs_exp.h provide 
additional functionality. Among other things this includes the 
capability to register a physical address as MR instead of a virtual 
one. For Mellanox drivers this is described in [1]. The call is 
ibv_exp_reg_mr with the IBV_EXP_ACCESS_PHYSICAL_ADDR flag.

Using the physical address for the reg_mr call we can then transfer data 
directly from our custom PCIe card to the network adapter, without any 
copies. Pretty nice!

To use the ibv_exp_reg_mr call I had to slightly patch libfabric, see 
[2]. Libfabric already uses the ibv_exp_reg_mr call if the verbs_exp.h 
is available, so just the flag needs to be added.

As for my feature request, can this be added as official feature to 
libfabric? I am not sure how many other providers could actually support 
physical address memory registration, but it is a nice feature when 
working with custom hardware and drivers. My patch is of course a bit 
"crude" and this would need to be properly implemented.

Thanks.

Cheers,
Jörn


[1] https://community.mellanox.com/docs/DOC-2480
[2] https://github.com/joerns/libfabric/compare/v1.6.x...joerns:phys_addr_mr


More information about the Libfabric-users mailing list