[openib-general] Sockets Extensions Completed
Michael Krause
krause at cup.hp.com
Fri Jan 7 15:50:56 PST 2005
At 12:21 PM 1/7/2005, Libor Michalek wrote:
>On Fri, Jan 07, 2005 at 11:03:47AM -0800, Josh England wrote:
> > This is probably not the right forum, but how would you say the ES-API
> > compares with DAPL?
>
> ES-API appears to be much closer to the Linux AIO extensions then to
>DAPL. DAPL provides an explicit RDMA API
DAPL and IT API (superset of DAPL) are designed to match RDMA hardware
semantics as closely as possible without requiring the application to be
coded to the verbs interface which may not be optimal for the typical
developer. Verbs interfaces like VAPI or the RNIC PI (a standard verbs
interface) are designed to be "too the metal" interfaces which may not be
reasonable for all but a subset of developers to use.
Sockets and the Extended Sockets API are designed to provide a general
purpose network API that can take advantage of existing and new
communication paradigms. The extensions were developed after close
evaluation of existing interfaces such as AIO, interface research, etc. and
working through their benefits, short-comings, etc. to find the right
balance in getting to the desired benefits. The extensions are able to
operate over a variety of network stack implementations enabling good
application portability / interoperability. Where an OS determines that it
can take advantage of RDMA, the extended Sockets provide opportunities such
as explicit memory management to make mapping to RDMA as optimal as
possible both from an implementation complexity perspective as well as with
minimal performance loss.
>for applications while AIO and ES-API allow for asynchronous operations on
>normal sockets, so you can
>perform standard socket operations, plus asynchronous operations. AIO and
>ES-API still maintain streaming socket symantics, while DAPL provides full
>RDMA capabilities, such as explicit memory placement.
Sockets / ES with SDP provides explicit memory placement but the
application developer does not have to worry about all of the details or
the interconnect-specifics. This has a number of advantages in terms of
broadening the application of RDMA interconnects to the larger application
space while still providing strong performance gains compared to a standard
network software implementation.
>ES-API goes a few steps further then the existing Linux AIO in providing
>explicit memory registration, which may or may not be a significant win,
>depending on application. Using Linux AIO, which does not have explicit
>memory registration, my existing SDP implementation can reach full data
>rate. The registration is done on the fly using FMRs
The cost of memory registration was evaluated in developing both iWARP and
the IB Verbs extensions. This was based on measurements taken on IB and
other hardware implementations. Not everyone was thrilled with the
performance cost thus we worked to get ES to support the explicit memory
management as well as provide more optimal memory management verbs. I view
this as an appropriate compromise since a developer can make the choice on
whether he / she manages the memory or relies upon the underlying
implementation do so on his / her behalf and deal with whatever performance
is actually delivered. There are other benefits as well but are perhaps
off topic from these forums.
Mike
>-Libor
>
>
> > On Fri, 2005-01-07 at 07:12 -0800, Michael Krause wrote:
> > > At 03:57 PM 1/6/2005, Josh England wrote:
> > > > Is it the API that is completed, or is there an implementation
> > > > written already?
> > >
> > > The API specification has been completed so now people can implement
> > > if there is interest. The API is part of the Unix branding effort
> > > done within the OpenGroup and is available to all for free. Given the
> > > desire to implement open standards within the open source community,
> > > this would seem like a logical API to support on Linux. How this is
> > > actually started / implemented on Linux is an open question. My main
> > > reason for providing the spec notification availability here is that
> > > the socket extensions when combined with SDP will provide optimum
> > > performance when used in conjunction with RDMA interconnects while
> > > providing a fairly familiar interface to most network application
> > > writers. For those that have implemented MPI over Sockets (not all
> > > have done this), this would also provide a cleaner mapping and still
> > > allow transparent access to RDMA with minimum performance impact.
> > >
> > > Mike
> > >
> > >
> > > > -JE
> > > >
> > > > On Thu, 2005-01-06 at 10:31 -0800, Michael Krause wrote:
> > > > >
> > > > > FYI.....The specification can be found at:
> > > > >
> > > > > http://www.opengroup.org/bookstore/catalog/c050.htm
> > > > >
> > > > > Use of this new interface will enable Sockets based applications
> > > > to
> > > > > fully exploit the performance of RDMA interconnects through the
> > > > SDP
> > > > > wire protocol. This API also provides explicit memory management
> > > > > taking some of the guesswork out of this thorny problem which can
> > > > > result in some performance loss and implementation complexity
> > > > within
> > > > > the SDP layer.
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Extended Sockets API (ES-API), Issue 1.0
> > > > > The Extended Sockets API (ES-API) Technical Standard provides
> > > > > extensions to the traditional socket API to support improved
> > > > > efficiency in network programming. The ES-API includes:
> > > > synchronous IO
> > > > > and control operations on sockets; event queue-based management of
> > > > > asynchronous operations; and pre-registering of memory regions
> > > > that
> > > > > will be the subject of IO operations. These facilities are
> > > > intended to
> > > > > support: improved efficiency when dealing with high numbers of
> > > > socket
> > > > > file descriptors; 'zero-copy' transmit and receive operations; and
> > > > > improved buffer management. The ES-API also includes routines that
> > > > > provide asynchronous IO and control operations, asynchronous
> > > > operation
> > > > > management, and memory registration functions for applications
> > > > > manipulating sockets.
> > > > >
> > > > >
> > > > >
> > > > > Bibliographic Details
> > > > > Consortium Specifications
> > > > >
> > > > > Catalog number C050
> > > > > ISBN 1931624526
> > > > > Jan 2005
> > > > >
> > > > > OO. 72 pages.
> > > > >
> > > > > _______________________________________________
> > > > > openib-general mailing list
> > > > > openib-general at openib.org
> > > > > http://openib.org/mailman/listinfo/openib-general
> > > > >
> > > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050107/ac1f0e0b/attachment.html>
More information about the general
mailing list