FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop

Eric W. Biederman ebiederman at lnxi.com
Thu Feb 10 23:36:17 PST 2005


"Kanevsky, Arkady" <Arkady.Kanevsky at netapp.com> writes:

> For kDAPL:
> The iSER has been submitted to Open Ib by Voltaire already.
> NFS-RDMA is at http://sourceforge.net/projects/nfs-rdma/.
> 
> For uDAPL: Oracle, DB2 and MPI.
> I am not aware if there is an open source MPI version on uDAPL.
> 
> These are publicly known.
> 
> As far as changing the uDAPL or kDAPL APIs.
> There are application already writen to them.

uDAPL at the API level is safe, it is in user space and it is not a
kernel concern.

Looking at just the API header files kDAPL is a non-starter for
kernel inclusion.

> There are implementation of these APIs on other platforms besides Linux.
> It is in nobody's interest to splinter the user community.

So why are these not sane extensions to the standard sockets APIs?
I do agree it does not make sense to further fragment things.

> We need the same API on all platforms.
In the kernel not a chance, that doesn't even make sense.

> If there is a good technical reason to change some specific APIs we
> should consider it.
> But the "burn the spec" approach is not a rationale one.

Try do a sensible implementation, instead of burn the spec....

> As far as other transport. As people already mentioned iWARP (IETF
> RDDP).

Just skimming the DAPL docs DAPL does not yet appear to service
anything except IB.


> It is still no ready so we will start with gen2.
> But lets not loose site of what DAPL brings:
> OS independent,
> Transport independent,
> RDMA APIs!!!

At user level that is probably great, if the abstraction layer is not
too heavy.  Except that MPI is so heavy I would wonder what the point of
doing all that when MPI already provides it.  

Inside an OS independence is nonsense.  Sorry kDAPL.

......

With that said there is a fundamental issue with network packet reception
in IP.  A packet first must be DMA'd to memory to be examined and then
copied to it's user for a total of 3 copies through the memory bus to
get anywhere.  How to solve that problem while still being robust
against denial of service attacks, and the other vagaries of hostile
public networks and not going the insane path of TCP offload is a
challenge.

For dealing with very fast networks the number of memory copies
start becoming a fundamental limit on how quickly things can
go.  Currently 4x IB runs at about a 1/3 of the memory bandwidth
of a modern memory controller, when you look at it bidirectionally.
As network interfaces continue their exponential increase in
speed this will only get worse.

The various flavors of RDMA seem to be a serious stab at fixing
the memory copy problem.  How well IP/RDMA or IB/RDMA will actually
fix the general problem I don't know, but it certainly deserves
a look, and probably a serious kernel design discussion.

Beyond that from a practical user level the gen2 stack is very
much easier to use than gen1.  But gen2 still needs to be extended
so user space can do good implementations of uDAPL of MPI.  

Getting RDMA kernel support or just full featured MPI support
to user space from the Linux kernel is going to be a challenge.
To get into the mainstream kernel requires a quality work, especially
when thing escalate from a single drivers private hack to the 
next generation version of the sockets API.  Kernel maintainers
can learn and can be convinced of new things but the developers
have to be willing to do the same.  This is where running
the gauntlet, the real flame fest begins.

Good Luck,

Eric





More information about the general mailing list