[ofa-general] New proposal for memory management

Barrett, Brian W bwbarre at sandia.gov
Wed Apr 29 15:28:06 PDT 2009


On 4/29/09 16:07 , "Woodruff, Robert J" <robert.j.woodruff at intel.com> wrote:

> Brian wrote,
> 
>> And Open Fabrics is the only "commodity" interfaces that makes implementers
>> go through these pains.  Myrinet's MX, Cray's Portals, and Quadric's Tports
>> all handle the issues either at the driver library or kernel module level.
> 
> One important note is that in general, Myrinet, Quadrics, and even Portals
> were designed to primarily to run MPI, so it is not a surprise that their
> interfaces
> map almost 1:1 to the MPI interfaces. Also, note that all of these use a
> tag-matching
> capability, which also seems to map well to MPI.
> RDMA/OFA verbs were designed to be a more general interface to
> support lots of ULPs, networking (tcp/ip), storage, etc, not just MPI.

True, although any of those could be extended to support the features
necessary for storage and such (and many already support IP).  The code
complexity claim is also true of sockets (TCP, in particular).  It's a lot
less code and doesn't make us jump through nearly as many hoops.  Obviously
it doesn't perform as well, but 5-6x the code complexity for OFED isn't a
good thing.

> That said, for hardware that does support these tag-matching capabilities,
> like
> myrinet, Qlogic's HCA (i.e. PSM), OpenMX, and even quadrix, maybe OFA should
> have a
> generic tag-matching set of verbs that the MPIs could use instead of the
> RDMA verbs. The IHVs, like Qlogic, MX, and others that support tag-matching
> could
> plug into this generic tag-matching infrastructure. The MPIs would then only
> have to
> write one driver in MPI to support all these different IHVs that support
> tag-matching,
> and that MPI driver would be a very simple one, since the tag-matching verbs
> would map almost 1:1 to the MPI interfaces, like MX or PSM do.

I think there are other problems with the verbs interface that would still
make MPI implementers twitch (some of which are in the slides Jeff sent out
to begin this discussion).  But I certainly wouldn't say no to a real set of
tag matching primitives.  Of course, that opens a whole can of worms that
I'm not sure OFED is ready to deal with.

It also may or may not solve the memory registration problem.  If the memory
in the matching verb still had to be registered, we haven't solved the
problem that started this discussion.  So the verb would have to also handle
memory registration, which seems to go against the general "OFA way".

> Heck, maybe we should even encourage the IBTA and iWARP associations to add
> tag-matching
> as a feature to the next version of the IBTA and iWARP specs. If they did
> that,
> it would make the MPI implementers life a lot easier. I would rather see that
> done,
> then hack thousands of lines of memory registration caching code and stuff it
> into the
> kernel.

I would love matching in the spec.  But I'm not sure it directly solves any
of the problems Jeff brought up in his talk at Sonoma.  I can cope with
having to do matching in the MPI (I'm going to have that code anyway for TCP
networks).  But it's the connection management, the memory pinning, and the
receive buffer space requirements that really drive us nuts and require the
bulk of our effort.

Brian 

--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories




More information about the general mailing list