[ofa-general] Re: [mwg] Re: RDMA tutorial and OFA

Todd Rimmer todd.rimmer at qlogic.com
Fri May 1 07:27:50 PDT 2009


It goes beyond just a tutorial.

In talking to customers, the consensus is that many application programmers struggle with sockets, RDMA is an order of magnitude beyond that.  It's not a cut on programmers, there are some very strong ones in the enterprise, but a fair percentage only have associate degrees or technical school training.  Even the extremely smart ones have 100 things to juggle (and often must write code such that entry level programmers can support it), so the risk/reward or ROI of learning RDMA has to be there.  The higher the learning cost the more difficult to justify the effort.

To summarize what is really needed:
- simplified APIs and easy migration of applications
	- SDP with zCopy was supposed to be a start, unfortunately the implementation required relinking applications.  Sounds simple to developers, but very tricky in the field, especially with complex apps, 3rd party scripts to start them, etc.  A kernel based "socket switch" approach is needed to make this 100% transparent.

- good simple examples of how to do it, sample programs etc
	- write the samples then analyze and improve the API to further simplify them
	- connection establishment is still difficult in OFED.  Also many apps are shortcutting the process by avoiding SA queries (hence impacting the ability of the applications to work properly with QOS, LMC, complex fabrics (torus, etc), Partitioning, etc).
	- either the Base API needs to improve or "helper libraries" are needed on top of it.

- effective tools to debug applications.  Right now there are very limited debug facilities in the ofa kernel (and most require a debug build), strace is not applicable to user verbs (due to kernel bypass), etc.  You need ways to analyze resources (QPs, MRs, etc) while the application is running or after it has dumped.  You need ways to trace the sequence of Verbs calls to analyze program behavior and bugs.  Also ways to analyze the "on wire" behavior (aka tcpdump) of an application while its running is needed.  Right now it's impossible in OFED to identify how many QPs are open, let alone which applications are using them, etc.  Tools like madeye are inefficient and lack the proper filtering to be effective for all but very simple problems.

- accessibility in scripting languages and other languages (java, C#, etc).  Many languages have powerful capabilities to manage sockets and TCP layers above it (http, smtp, etc).  However there is no effective way to use RDMA and IB in languages other than C.  A start for scripting languages could be the transparent SDP approach.  For java, C++, C# and other languages there needs to be effective APIs and libraries that map well into the style of the language.

Todd Rimmer
Chief Architect 
QLogic Network Systems Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com
 

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org [mailto:general-
> bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres
> Sent: Friday, May 01, 2009 9:18 AM
> To: Ryan, Jim
> Cc: iwg at lists.openfabrics.org; Paul Grun; asafs`@voltaire.com; Paul
> Gray; Working Group; Wayne Augsburger; Lloyd Dickman; Sumanta
> Chatterjee; Mikkel Hagen; Roland Dreier (rdreier); bobs at voltaire.com;
> Jeff at lists.openfabrics.org; general at lists.openfabrics.org; Friedman;
> bill.boas at openfabrics.org; OFA at lists.openfabrics.org;
> Scott at lists.openfabrics.org
> Subject: [ofa-general] Re: [mwg] Re: RDMA tutorial and OFA
> 
> I'd also like to call the IWG's and MWG's attention to the other
> thread currently running on the general list: "New proposal for memory
> management."
> 
> There are many points in there about attracting non-HPC / enterprise
> network programmers to write verbs-based applications.  It's not just
> documentation / education that is missing -- having a series of FAQs
> and tutorials about verbs programming is not enough.  You need a
> network programming API that is no more complex than common sockets
> usage.
> 
> Specifically: let's not forget that HPC (OF's biggest market right
> now) tends to attract network programmers with PhD's, and/or who are
> among the top programming talent in the world (yes, that's being
> snobbish -- but it's still true).  To make OF within reach of the
> masses, you want to lower the bar so that legions of sockets-based
> network programmers can hope to learn/use this stuff without requiring
> them to get a PhD first.
> 
> 
> 
> On Apr 30, 2009, at 6:12 PM, Ryan, Jim wrote:
> 
> > At the risk of piling on, I think what Lloyd is suggesting is very
> > important. The objections I continue to hear about programming using
> > RDMA are along the lines of "it's too hard" or "no one knows how to
> > do it".
> >
> > It occurs to me if we could provide some concise instruction, that,
> > coupled with the undeniable benefits of RDMA, could provide a
> > compelling package for "RDMA for the masses"
> >
> > thanks, Jim
> >
> > From: mwg-bounces at lists.openfabrics.org [mailto:mwg-
> bounces at lists.openfabrics.org
> > ] On Behalf Of Lloyd Dickman
> > Sent: Thursday, April 30, 2009 1:17 PM
> > To: arkady kanevsky; bill.boas at openfabrics.org
> > Cc: iwg at lists.openfabrics.org; Paul Grun; OFA at lists.openfabrics.org;
> > Paul Gray; Working Group; Wayne Augsburger; Andy Grover; Richard
> > Frank;Jeff at lists.openfabrics.org; Squyres; Mikkel Hagen;
> Scott at lists.openfabrics.org
> > ; general at lists.openfabrics.org; Friedman; bobs at voltaire.com;
> > Sumanta Chatterjee;asafs`@voltaire.com; Roland Dreier
> > Subject: RE: [mwg] Re: RDMA tutorial and OFA
> >
> > I support the idea of the RDMA tutorial.  Beyond the "meat" as
> > described below, I would encourage the tutorial to include a "how to
> > program RDMA" section.  While OFA Verbs provides a rich set of
> > mechanisms, it is difficult for the average programmer to get a
> > solid handle on how to use the capabilities, register memory, ...
> > Some cookbook examples, or perhaps development of several
> > programming "patterns" can go a long way to having RDMA become a
> > much more mainstream application programming paradigm.
> >
> > Lloyd
> >
> > From: mwg-bounces at lists.openfabrics.org [mailto:mwg-
> bounces at lists.openfabrics.org
> > ] On Behalf Of arkady kanevsky
> > Sent: Thursday, April 30, 2009 11:27 AM
> > To: bill.boas at openfabrics.org
> > Cc: iwg at lists.openfabrics.org; Paul Grun; Paul Gray; OFA Marketing
> > Working Group; Wayne Augsburger; Andy Grover; Richard Frank;
> asafs`@voltaire.com
> > ; Jeff Squyres; Mikkel Hagen;general at lists.openfabrics.org; Scott
> > Friedman; bobs at voltaire.com; Sumanta Chatterjee; Roland Dreier
> > Subject: [mwg] Re: RDMA tutorial and OFA
> >
> > Keep me in the loop.
> > I am interested to do it also.
> > Thanks,
> > Arkady
> > On Thu, Apr 30, 2009 at 1:39 PM, Bill Boas
> > <Bill.Boas at openfabrics.org> wrote:
> > Richard, Andy,
> >
> > Thanks for copying me Richard. I had not seen Andy's email on the
> > general
> > list.
> >
> > Figuring out how to get tutorial and other documentation created and
> > published in the list of things to get done in 2009 for me in my
> > part-time
> > role as Exec. Dir.
> >
> > There is no funding set up for this at the moment but I believe
> > there will
> > be in about 30 days.
> >
> > That's because I'm thinking that we can get funding for this by
> > making it
> > part of the funding for a new marketing plan for OFA that, with Wayne
> > Augsburger and Jim Ryan, we are preparing for the OFA Board to vote
> > on at
> > the next con-call meeting which is on May 20 at 9.00AM PDT.
> >
> > Would you be willing to work with me and create a small team from
> > others
> > within OFA who have the same interest to prepare a description by
> > May 20 of
> > what the tutorial would look like, who would contribute to it, how
> > to get it
> > "polished up" for web and/or book style publication, what the
> > overall costs
> > would be, etc.
> >
> > My thoughts, that could be a starting point for the team's work, are
> > that we
> > would make the creation a collective effort.
> >
> > The tutorial would have several sections for example general intro,
> > benefits
> > of RDMA, applicability in HPC and Enterprise, networking background
> > etc.
> > Members of the Marketing Working Group would be responsible for this.
> >
> > The "meat" would be sections for kernel level things (verbs etc.),
> > then user
> > space things (verbs etc.), then APIs like MPI, SDP, EDS etc. - each
> > section
> > overseen by the technical leaders/maintainers of the code within OFA
> > for
> > that section (for Example Tom Talpey for NFSoRDMA, or you Richard
> > for RDS)
> >
> > Finally the tutorial would have sections about Interoperability
> > Testing that
> > OFA/IOL does but also what customers can do on there own systems -
> > Arkady
> > and Rupert and IOL have put in an SC09 tutorial proposal that we
> could
> > leverage in this section.
> >
> > To all readers of this email:-
> > If you have read this far, please give us all some feedback. If you
> > have
> > material you'd like to contribute please say so. If there's a better
> > way,
> > tell us what you think it is!
> >
> > Thanks,
> >
> > Bill.
> >
> > Bill Boas
> > Executive Director and Vice Chair
> > OpenFabrics Alliance
> > 510-375-8840
> > Bill.Boas at openfabrics.org
> > www.openfabrics.org
> >
> > -----Original Message-----
> > From: Richard Frank [mailto:richard.frank at oracle.com]
> > Sent: Wednesday, April 29, 2009 12:58 PM
> > To: Andy Grover
> > Cc: Bill Boas; Sumanta Chatterjee
> > Subject: Re: RDMA tutorial and OFA
> >
> > Andy, I saw your postings to ofa-general on this and I agree it
> > would be
> > great to have this documentation.
> >
> > As OpenFabrics is really about RDMA... we need to make it simpler
> > for folks to pick up and run with RDMA concepts ...vs.. digging thru
> > the IB
> > specs and code examples, etc.
> >
> > Let's see what Bill Boas thinks...perhaps OFA has a writer on board
> > that
> > can help us do this..?
> >
> > I can also help provide input for a new OFA RDMA tutorial doc..
> >
> > Rick
> >
> > Andy Grover wrote:
> > > Hi Rick,
> > >
> > > Are you around for a brief chat this afternoon? I have a crazy
> > idea that
> > > involves OFA doing something (or putting up $$) and I wanted to
> > see what
> > > you thought, since you're Oracle's OFA rep, right?
> > >
> > > -- Andy
> > >
> > >
> >
> >
> >
> > --
> > Cheers,
> > Arkady Kanevsky
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-
> general



More information about the general mailing list