[Openframeworkwg] Some concerns about the new Fabric/IBverbs API

Hefty, Sean sean.hefty at intel.com
Wed Dec 4 18:13:24 PST 2013


> 	> That kind functionality is already provided by the shared library
> systems
> 	> in Linux. Could not be each of the modules that we talked about
> just be
> 	> mapped to a shared library and define the API simply that way?
> There are
> 	> mechanisms in place to deal with compatibility issues.
> 
> 
> 	These APIs are usable together, not stand-alone.  The calls used to
> report events are used with the data transfer calls.  And different data
> transfer calls work with the same endpoint (e.g. message send and RDMA
> write).
> 
> 
> The same can be said done using libraries if you create the proper
> dependencies.
> 
> 
> 
> 	> What are the specific requirements that prevent us from using the
> standard
> 	> library approach? If at all possible we should simply use what is
> there.
> 
> 
> 	Can you provide more specific details on how you see this working?
> I'm not following what sort of changes or approach you're thinking about.
> 
> 
> 
> I am not clear on why we have this complicated API with the extension when
> the linker etc already provide such a framework with associated tools to
> manage them. There is a history for example of adding/modifying functions
> to glibc as well to the kernel. Do we really cannot use the established
> means of defining user space APIs?

I'm suggesting that a call be set at run time based on the properties selected by the application.  We replace internal switch/if statements with function indirection.

In place of:

struct ibv_context.ops.post_send(...)
	- 1 post send call per open device, handles all QP types
	- the post call handles send message, RDMA reads, RDMA writes, atomics, etc.

we have

struct fi_endpoint.ops.send_message(...)
	- send call associated with the endpoint (i.e. QP)
	- call only handles sending a message

Internally, the provider _can_ implement multiple send_message() calls, such as:

prov_send_ud_message(...) /* generic sends over a UD QP */
prov_send_rc_message(...) /* generic sends over an RC QP */
prov_send_buffered_message(...) /* copies buffer before sending */

endpoint.ops.send_message() would point to the best call.  Application requirements would indicate what type of calls and options the providers should optimize for, and providers can select which apps they want to target.

The advantage that libibverbs has over dlopen today is that 'fast path' calls go directly into the provider library.  I don't know how you do that with dlopen - maybe there's a way...

We can separate 'ibv_post_send()' into multiple calls -- ibv_post_send_message, ibv_post_rdma_write, ibv_post_rdma_read, etc.  But that still doesn't give us the same amount of flexibility or optimization that we can achieve using what is essentially an object-oriented programming model.  Conceptually, we're defining C++ classes using C.  If we were to have libfabric export a C++ interface, I think the ideas would look natural.

I agree that we should use whatever standard mechanisms that we can, and I don't believe the framework should turn into a complex dlopen construct.  But it's not clear to me how dlopen with 'flat' (non-object based) interfaces will give us the same performance or flexibility.


> VFIO could be used as a base API for getting access to the device. Then we
> may need to have some additional syscalls that may be required to provide
> functionality that the existing VFIO interface does not provide.
> 
> Vendors are already able to write an user space driver just based on the
> VFIO spec. If the device supports VFIO then basically also RDMA is possible
> by acquiring a VFIO instance of the device and managing the RX and TX
> queues in user space. The mellanox driver supports VFIO f.e.
> 
> One problem may be that one VFIO instance is needed per process that uses
> offload/RDMA and also per device. For our needs however that means that we
> have full control of the device at the register level. Which seems
> desirable to many of us and would allow the full use of all the features of
> the hardware. The API is vendor specific though. What would be needed is a
> set of higher level libraries that allow the vendor to provide a userspace
> library that then ties into the RDMA framework to make the use of these
> devices easy from userspace.

One solution is to support multiple mechanisms for communicating between user space and the kernel, with the goal of migrating to VFIO.  The existing devices either communicate through ib_user_verbs or using proprietary means.  IMO, trying to move completely away from that to a new interface would likely take too much initial effort, since no existing devices would be supported without kernel changes.

- Sean



More information about the ofiwg mailing list