[ofw] [PATCH 0/4] OFED compat libraries: enhance scalability
Sean Hefty
sean.hefty at intel.com
Tue Mar 10 16:28:32 PDT 2009
The current implementations of the libibverbs and librdmacm libraries
are limited in their scalability by using OS event objects to report
certain type of IB/RDMA events: asynchronous events on an HCA, CQ
completion events, and connection events.
This limitation comes from the limit of WaitForMultipleObjects,
which is limited to at most 64 event objects. This limit is easily
seen by running rdma_cmatose using the -c option, and setting the
number of connections > 64.
Because overlapped operations cannot share OS event objects, the best
(only?) viable scaling alternative is to use I/O completion ports (IOCPs).
However, since an open OS file object can be associated with only one
IOCP, that IOCP must be shared between the libibverbs and librdmacm
libraries. Additionally, this requires a mechanism on top of the IOCP
capable of de-multiplexing IB/RDMA events to the correct channel:
ibv_comp_channel, rdma_event_channel, or ibv_context (for async
events).
Add a new abstraction, the COMP_CHANNEL, capable of de-multiplexing
overlapped completion events among multiple queues. The libibverbs
and librdmacm are modified to use that abstraction for reporting events.
In order to share objects between libibverbs and the librdmacm, add
Windows specific APIs to libibverbs to get and release Windows specific
data. The COMP_CHANNEL abstraction is exposed directly to applications,
which provides them greater control over how events are monitored.
(This is similar to how fd's are exposed in the OFED code, which DAPL
makes use of.)
Finally, in order to scale the number of connections, modify the IB CM
to drop, rather than reject, incoming REQs that are not immediately ready
for processing because of backlog limitations. This allows the REQ to
be retried and the connection established once the backlog has dropped.
This matches the behavior that was added to the Linux stack.
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
More information about the ofw
mailing list