[ofa-general] Re: [mwg] Re: RDMA tutorial and OFA

Jeff Squyres jsquyres at cisco.com
Fri May 1 08:25:40 PDT 2009


Hear, hear!

FWIW, I think the attached slide shows it pictorially pretty well.

A good one-line summary: MPI is so popular [in HPC] because the simple  
things are simple; with verbs, even the simple things are hard.


On May 1, 2009, at 10:27 AM, Todd Rimmer wrote:

> It goes beyond just a tutorial.
>
> In talking to customers, the consensus is that many application  
> programmers struggle with sockets, RDMA is an order of magnitude  
> beyond that.  It's not a cut on programmers, there are some very  
> strong ones in the enterprise, but a fair percentage only have  
> associate degrees or technical school training.  Even the extremely  
> smart ones have 100 things to juggle (and often must write code such  
> that entry level programmers can support it), so the risk/reward or  
> ROI of learning RDMA has to be there.  The higher the learning cost  
> the more difficult to justify the effort.
>
> To summarize what is really needed:
> - simplified APIs and easy migration of applications
>         - SDP with zCopy was supposed to be a start, unfortunately  
> the implementation required relinking applications.  Sounds simple  
> to developers, but very tricky in the field, especially with complex  
> apps, 3rd party scripts to start them, etc.  A kernel based "socket  
> switch" approach is needed to make this 100% transparent.
>
> - good simple examples of how to do it, sample programs etc
>         - write the samples then analyze and improve the API to  
> further simplify them
>         - connection establishment is still difficult in OFED.  Also  
> many apps are shortcutting the process by avoiding SA queries (hence  
> impacting the ability of the applications to work properly with QOS,  
> LMC, complex fabrics (torus, etc), Partitioning, etc).
>         - either the Base API needs to improve or "helper libraries"  
> are needed on top of it.
>
> - effective tools to debug applications.  Right now there are very  
> limited debug facilities in the ofa kernel (and most require a debug  
> build), strace is not applicable to user verbs (due to kernel  
> bypass), etc.  You need ways to analyze resources (QPs, MRs, etc)  
> while the application is running or after it has dumped.  You need  
> ways to trace the sequence of Verbs calls to analyze program  
> behavior and bugs.  Also ways to analyze the "on wire" behavior (aka  
> tcpdump) of an application while its running is needed.  Right now  
> it's impossible in OFED to identify how many QPs are open, let alone  
> which applications are using them, etc.  Tools like madeye are  
> inefficient and lack the proper filtering to be effective for all  
> but very simple problems.
>
> - accessibility in scripting languages and other languages (java,  
> C#, etc).  Many languages have powerful capabilities to manage  
> sockets and TCP layers above it (http, smtp, etc).  However there is  
> no effective way to use RDMA and IB in languages other than C.  A  
> start for scripting languages could be the transparent SDP  
> approach.  For java, C++, C# and other languages there needs to be  
> effective APIs and libraries that map well into the style of the  
> language.
>
> Todd Rimmer
> Chief Architect
> QLogic Network Systems Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com
>
>
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org [mailto:general-
> > bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres
> > Sent: Friday, May 01, 2009 9:18 AM
> > To: Ryan, Jim
> > Cc: iwg at lists.openfabrics.org; Paul Grun; asafs`@voltaire.com; Paul
> > Gray; Working Group; Wayne Augsburger; Lloyd Dickman; Sumanta
> > Chatterjee; Mikkel Hagen; Roland Dreier (rdreier);  
> bobs at voltaire.com;
> > Jeff at lists.openfabrics.org; general at lists.openfabrics.org; Friedman;
> > bill.boas at openfabrics.org; OFA at lists.openfabrics.org;
> > Scott at lists.openfabrics.org
> > Subject: [ofa-general] Re: [mwg] Re: RDMA tutorial and OFA
> >
> > I'd also like to call the IWG's and MWG's attention to the other
> > thread currently running on the general list: "New proposal for  
> memory
> > management."
> >
> > There are many points in there about attracting non-HPC / enterprise
> > network programmers to write verbs-based applications.  It's not  
> just
> > documentation / education that is missing -- having a series of FAQs
> > and tutorials about verbs programming is not enough.  You need a
> > network programming API that is no more complex than common sockets
> > usage.
> >
> > Specifically: let's not forget that HPC (OF's biggest market right
> > now) tends to attract network programmers with PhD's, and/or who are
> > among the top programming talent in the world (yes, that's being
> > snobbish -- but it's still true).  To make OF within reach of the
> > masses, you want to lower the bar so that legions of sockets-based
> > network programmers can hope to learn/use this stuff without  
> requiring
> > them to get a PhD first.
>


-- 
Jeff Squyres
Cisco Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jsquyres-panel-barriers-to-ofed-adoption-slide-5.pdf
Type: application/pdf
Size: 345219 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090501/8cd64f57/attachment.pdf>


More information about the general mailing list