[openib-general] gen2/rnic-pi differences
Caitlin Bestler
caitlinb at siliquent.com
Thu Jun 30 08:53:23 PDT 2005
This is a list of structural differences in between OpenIB's
gen2 verbs and RNIC-PI that will remain even after gen2 is made
"transport neutral".
After these distinctions are understood, a decision should be made as to
whether the differences represent different objectives and should be
merely documented for the benefit of IHVs or whether their should be a
migration to end that specific difference.
For example, it might be decided that a given RNIC-PI feature was
inherently related to other Operating Systems.
The response might be to merely not the difference, or to determine when
Linux could support such a feature.
Through this discussion the term "kVP" is used to reference the
model/provider specific code that executes in the kernel, while "uVP"
references model/provider specific code that executes in user space.
Memory Registration / Lookups
gen2 translates virtual memory registrations
to physical lists before the kVP is invoked.
RNIC-PI expects the virtual memory registration
request to be passed to the kVP untranslated. The
kVP then makes a callback to obtain address
translations and to pin memory. Mapping and
pinning may be performed as separate steps,
allowing mapping of Consumer pinned memory.
gen2 does not currently support registration of
shared memory regions.
Locking
gen2 does not have a clear statement about expection
of who is responsible for preventing concurrent
data access. Fastpath operations must be callable
from within an interrupt or while holding a spinlock.
Slowpath operations are allowed to block.
It is not clear if gen2 would allow suppression of
locking when the caller has taken responsibility for
serializing all object access. RNIC-PI leaves division
of that responsibility to be worked out between the
Consumer and the Access Layer (DAT/IT-API).
RNIC-PI allows allocating slowpath operations to block,
but does not allow non-allocating slowpath operations or
fastpath operations to block or stall indefinitely. For
the latter cases it must be legal for the caller to hold
a spinlock over the call.
By default, RNIC-PI places responsibility for serializing
access to an object on the caller. RNIC-PI has tentatively
decided to allow a second optional set of verbs where the
verb layer will provide serializations.
User-Mode Handles
gen2 never exports kernel pointers to user-mode, but rather
registers all such pointers as handles using standard routines.
All handles passed back in from user-mode are validated as
a by-product of translation back to kernel mode pointers.
RNIC-PI assigns that responsibility to the kAL (Kernel Access
Layer) but only requires *validation* of handles. It does not
explicitly address *translation* nor placing them in a central
registry.
os_data / Identification of Consumer Objects
gen2 provides minimal support for identification of RDMA
resources using consumer supplied handles. A user-supplied
context is available in callbacks, but not in work completions.
RNIC-PI provides a general "os_data" capability that allows
each RNIC-PI object to have a consumer supplied alias that
is used for all queries (including on other objects), callbacks
and work completions.
The RNIC-PI approach can eliminate the need for reverse indexes
or per work request tracking data by the verbs consumer (such as
the "DTO_COOKIE" in the reference DAPL implementation).
This is of greatest concern when reaping a work completion in
user mode, as that there is no way to translate a qp_num to a
QP object in user mode. It isn't that easy in kernel mode when
dealing with multiple vendors, either.
Work Request Opaques / Local Solicited / Threshold Solicited
RNIC-PI provides "Work Request Opaques" that allow the verbs
consumer (especially DAPL/IT-API) to mark certain work requests
with pass-through flags rather than using a parallel data
structure
such as the DTO_COOKIE.
One of these flags allows a work request to be marked as
"local solicited", which will make it an urgent event (one
justifying a completion notification callback) when it completes
successfully (essentially setting the solicited bit locally).
IHVs can support these bits a) not at all, b) as pass-thru
or c) actually implement the Local Solicited semantics in
hardware.
gen2 also provides an enhanced method for providing an earlier
completion callback than provided for in the verbs, but it is
more
akin to the DAT/IT-API evd threshold feature. RNIC-PI defines no
such feature, partially because callbacks always occur in the
kernel.
kernel callbacks
RNIC-PI only provides kernel callbacks. No callbacks are
provided
in user mode. It is assumed that the callback routine is part of
the kAL (Kernel Access Layer) and that it will co-ordinate
unblocking
of EVD waiters with the uAL (User Access Layer). This allows
optimized
handling of many callback scenarios where the net effect is to
kick
a file descriptor, wake another thread or to take no immediate
action.
It also avoids uVPs having to add callback support, which they
were
not required to have under the RDMAC verbs.
gen2 provides a standardized relay of callback notification
events
from kernel mode to user mode.
ihv_data / model specific data
RNIC-PI defines an opaque pointer that can be used to
communicate
model-specific data between the uVP and kVP.
gen2 allows vendors to add extra bytes both IN and OUT to each
verb request / response communicated over the user context fd.
The same information can be communicated effectively using
either approach.
user / kernel communications
gen2 creates a file descriptor for each open RDMA Device
instance
(i.e, per device per client).
RNIC-PI does not define how the sysCall is implemented, but
implies
that there is one per client (no matter how many devices).
Additional error information
gen2 provides for vendor specific error information in work
completions.
RNIC-PI provides for additional OS-specific error reporting
through the
'err_data' opaque in a variety of contexts. But it is
OS-specific.
STag0 (or equivalent)
RNIC-PI has each rnic define a pre-existing
"all physical memory" memory region (STag0
for iWARP).
gen2 provides a verb for the kVP to create
such a memory region.
Equivalent results can be achieved with
either interface, and equivalent support
from the kVP is required in either case.
Doorbells
gen2 defines a standard method for mapping
the doorbell.
RNIC-PI presumes that this will be solved
between the uVP and kVP and that no standard
interface is required.
peek_cq
gen2 defines a method to peek at a cq (see
if there are more entries there without
attempting to reap a completion).
RNIC-PI does not define this, but it should
be feasible for almost all implementations.
More information about the general
mailing list