[ofa-general] Re: [mwg] Re: RDMA tutorial and OFA
Jeff Squyres
jsquyres at cisco.com
Fri May 1 08:25:40 PDT 2009
Hear, hear!
FWIW, I think the attached slide shows it pictorially pretty well.
A good one-line summary: MPI is so popular [in HPC] because the simple
things are simple; with verbs, even the simple things are hard.
On May 1, 2009, at 10:27 AM, Todd Rimmer wrote:
> It goes beyond just a tutorial.
>
> In talking to customers, the consensus is that many application
> programmers struggle with sockets, RDMA is an order of magnitude
> beyond that. It's not a cut on programmers, there are some very
> strong ones in the enterprise, but a fair percentage only have
> associate degrees or technical school training. Even the extremely
> smart ones have 100 things to juggle (and often must write code such
> that entry level programmers can support it), so the risk/reward or
> ROI of learning RDMA has to be there. The higher the learning cost
> the more difficult to justify the effort.
>
> To summarize what is really needed:
> - simplified APIs and easy migration of applications
> - SDP with zCopy was supposed to be a start, unfortunately
> the implementation required relinking applications. Sounds simple
> to developers, but very tricky in the field, especially with complex
> apps, 3rd party scripts to start them, etc. A kernel based "socket
> switch" approach is needed to make this 100% transparent.
>
> - good simple examples of how to do it, sample programs etc
> - write the samples then analyze and improve the API to
> further simplify them
> - connection establishment is still difficult in OFED. Also
> many apps are shortcutting the process by avoiding SA queries (hence
> impacting the ability of the applications to work properly with QOS,
> LMC, complex fabrics (torus, etc), Partitioning, etc).
> - either the Base API needs to improve or "helper libraries"
> are needed on top of it.
>
> - effective tools to debug applications. Right now there are very
> limited debug facilities in the ofa kernel (and most require a debug
> build), strace is not applicable to user verbs (due to kernel
> bypass), etc. You need ways to analyze resources (QPs, MRs, etc)
> while the application is running or after it has dumped. You need
> ways to trace the sequence of Verbs calls to analyze program
> behavior and bugs. Also ways to analyze the "on wire" behavior (aka
> tcpdump) of an application while its running is needed. Right now
> it's impossible in OFED to identify how many QPs are open, let alone
> which applications are using them, etc. Tools like madeye are
> inefficient and lack the proper filtering to be effective for all
> but very simple problems.
>
> - accessibility in scripting languages and other languages (java,
> C#, etc). Many languages have powerful capabilities to manage
> sockets and TCP layers above it (http, smtp, etc). However there is
> no effective way to use RDMA and IB in languages other than C. A
> start for scripting languages could be the transparent SDP
> approach. For java, C++, C# and other languages there needs to be
> effective APIs and libraries that map well into the style of the
> language.
>
> Todd Rimmer
> Chief Architect
> QLogic Network Systems Group
> Voice: 610-233-4852 Fax: 610-233-4777
> Todd.Rimmer at QLogic.com www.QLogic.com
>
>
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org [mailto:general-
> > bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres
> > Sent: Friday, May 01, 2009 9:18 AM
> > To: Ryan, Jim
> > Cc: iwg at lists.openfabrics.org; Paul Grun; asafs`@voltaire.com; Paul
> > Gray; Working Group; Wayne Augsburger; Lloyd Dickman; Sumanta
> > Chatterjee; Mikkel Hagen; Roland Dreier (rdreier);
> bobs at voltaire.com;
> > Jeff at lists.openfabrics.org; general at lists.openfabrics.org; Friedman;
> > bill.boas at openfabrics.org; OFA at lists.openfabrics.org;
> > Scott at lists.openfabrics.org
> > Subject: [ofa-general] Re: [mwg] Re: RDMA tutorial and OFA
> >
> > I'd also like to call the IWG's and MWG's attention to the other
> > thread currently running on the general list: "New proposal for
> memory
> > management."
> >
> > There are many points in there about attracting non-HPC / enterprise
> > network programmers to write verbs-based applications. It's not
> just
> > documentation / education that is missing -- having a series of FAQs
> > and tutorials about verbs programming is not enough. You need a
> > network programming API that is no more complex than common sockets
> > usage.
> >
> > Specifically: let's not forget that HPC (OF's biggest market right
> > now) tends to attract network programmers with PhD's, and/or who are
> > among the top programming talent in the world (yes, that's being
> > snobbish -- but it's still true). To make OF within reach of the
> > masses, you want to lower the bar so that legions of sockets-based
> > network programmers can hope to learn/use this stuff without
> requiring
> > them to get a PhD first.
>
--
Jeff Squyres
Cisco Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jsquyres-panel-barriers-to-ofed-adoption-slide-5.pdf
Type: application/pdf
Size: 345219 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090501/8cd64f57/attachment.pdf>
More information about the general
mailing list