[ofa-general] New proposal for memory management
Jeff Squyres
jsquyres at cisco.com
Mon May 4 17:25:23 PDT 2009
I think that this thread has gotten to the point where people are no
longer reading each post carefully and are therefore re-hashing points
that have already been discussed. It has therefore reached the end of
its usefulness.
It was suggested today that a teleconference to discuss these issues
might be much more useful (an hour-long teleconference can save a
week's worth of emails!). This will be a technical call to discuss
memory registration issues; it will not be an EWG call. I've setup a
WebEx call for next Monday at the "normal" time: noon US Eastern, 9am
US Pacific, 7pm Israel. The invite will be coming to the ewg and
general lists shortly.
*** PLEASE USE THE WEBEX URL TO JOIN THE TELECONFERENCE (vs. just
dialing in)
(when you logon, it'll prompt you for a phone number to call you
back;
yes, non-US phone numbers are supported)
I will make up a small number of slides that attempt to summarize all
the arguments (on both sides) so far. Hopefully, they can serve as a
starting point for discussion.
Thanks; see you next Monday.
On May 1, 2009, at 1:09 PM, Roland Dreier (rdreier) wrote:
> > You mentioned that doing this stuff is a choice; the choice that
> > MPI's/ ULPs/applications therefore have is:
> >
> > - don't use registration caches/memory allocation hooking, have
> > terrible performance
> > - use registration caches/memory allocation hooking, have good
> > performance
>
> I think it's a bit of a stretch to suggest that all or even most
> userspace RDMA applications have the same need for registration
> caching
> as MPI. In fact my feeling is that the fact that MPI must deal with
> RDMA to arbitrary memory allocated by an application out of MPI's
> control is the exception. My most recent experience was with Cisco's
> RAB library, and in that case we simply designed the library so that
> all
> RDMA was done to memory allocated by the library -- so no need for a
> registration cache, and in fact no need for registration in any fast
> path. I suspect that the majority of code written to use RDMA
> natively
> will be designed with similar properties.
>
> So this proposal is very much an MPI-specific interface. Which
> leads to
> my next point. I have no doubt that the MPI community has a very good
> idea of a memory registration interface that would make MPI
> implementations simpler and more robust. However I don't think
> there's
> quite as much expertise about what the best way to implement such an
> interface is.
>
> My initial reaction is that I don't want to extend the kernel ABI with
> a set of new MPI-specific verbs if there's a way around it. We've
> been
> told over and over that the registration cache is complex and fragile
> code -- but moving complex and fragile code into the kernel doesn't
> magically make it any simpler or more robust, it just means that bugs
> now crash the whole system instead of just affecting one process.
>
> Now, of course MMU notifiers allow the kernel to know reliably when a
> process's page tables change, which means that all the complicated
> malloc hooking etc is not needed. So that complexity is avoided in
> the
> kernel. But suppose I give userspace the same MMU notifier capability
> (eg I add a system call like "if any mappings in the virtual address
> range X ... Y change, then write a 1 to virtual address Z") -- then
> what
> do I gain from having the rest of the registration caching in the
> kernel? (And avoiding the duplication of caching code between
> multiple
> MPI implementations is not an answer -- it's quite feasible to put the
> caching code into libibverbs if that's the best place for it)
>
> - R.
--
Jeff Squyres
Cisco Systems
More information about the general
mailing list