[ofw] port RDS to OFW - request for input...

Sean Hefty sean.hefty at intel.com
Wed Jul 22 16:28:39 PDT 2009


>Yes - we too are looking at this approach.. and it is true that Oracle
>on Windows is a
>multi-threaded process. So we could share a single connection across all
>threads in a
>process and multiplex over it..

I don't know that this is the best approach, so don't take me as pushing for
this option.  Wouldn't this just be moving the Linux kernel RDS RDMA
implementation to run as a Windows library (as a winsock provider)?  Is there
anything that prevents multiple RDS connections from being used on the same
system (other than it can be inefficient)?  This seems like the worst case that
could happen.

>a) we need to keep the IPC behavior the same on all platforms - as we
>have the same IPC
>clients running and inter-operating over them all - if we can do this
>while moving to user
>mode that would work..
>
>b) dynmaic registration of local buffers on the fly - requires a kernel
>mode transition. It is likely
>that we'd be doing this given the nature of our IPC clients. So any
>performance advantages of
>user mode may be lost..

I was thinking more about: taking advantage of the OFED libibverbs and librdmacm
interfaces being available on Windows, and using the Windows calls that are
available in user space for pre-2008 systems.  (I'm guessing that you care about
running on pre-2008 systems.)

Winverbs shares responsibility for RDMA CM connections between the driver and
the user space library.  The driver implements the IB CM protocol for the RDMA
CM connections, but relies on the library to handle the address and route
resolution.

>c) RDS bcopy send operations complete immediately - e.g. are synchronous
>from the RDS client
>perspective. The send buffer ownership is returned to the client as soon
>as it is copied to a kernel buffer..
>Our RDS clients depend on this behavior.. If we move to user mode and
>post sends directly -
>then we'd need to deal with async send completions which introduces
>latencies for send completions
>to the local client.. etc..

Couldn't you just copy to a local buffer, possibly already registered?  Why does
it need to be a kernel buffer?

>Use winverbs driver for connection management only and use ibal for data
>xfer ops ?

I was thinking more of using the HCA interfaces for data transfer and something
else for connection management.  Winverbs does implement the IB CM protocol for
supporting the RDMA CM for user space.  I don't know if it would be easier to
expose a kernel interface, create a new driver to expose it, or just embed the
functionality into a kernel RDS driver.

Referring back to your original post, this does seem like functionality we'd
like to make available to other consumers if its added, but IMO, the WinOF
supported interfaces should match up with Windows and not try to mimic
non-stable Linux kernel interfaces.  I envision a direct call interface that's
similar to the winverbs user space API, but with IRP in place of OVERLAP.

>We'd still need to sort out address translation.. ?

Yet another option is to split the RDS functionality out similar to winverbs.
Let the address resolution occur in user space, with the results passed to the
kernel, which establishes the connection.

I'm assuming that 2008 exposes the kernel interfaces that we require to
implement the rdma_cm.  I'm pretty sure that pre-2008 does not.  I'm not certain
if we only support such an interface on 2008+, or find a way to handle older
versions of Windows, possibly communicating with a user space service for
address resolution.  (I don't know how one does that.)

- Sean




More information about the ofw mailing list