[ofa-general] New proposal for memory management

Mon May 4 17:25:23 PDT 2009

I think that this thread has gotten to the point where people are no  
longer reading each post carefully and are therefore re-hashing points  
that have already been discussed.  It has therefore reached the end of  
its usefulness.

It was suggested today that a teleconference to discuss these issues  
might be much more useful (an hour-long teleconference can save a  
week's worth of emails!).  This will be a technical call to discuss  
memory registration issues; it will not be an EWG call.  I've setup a  
WebEx call for next Monday at the "normal" time: noon US Eastern, 9am  
US Pacific, 7pm Israel.  The invite will be coming to the ewg and  
general lists shortly.

*** PLEASE USE THE WEBEX URL TO JOIN THE TELECONFERENCE (vs. just  
dialing in)
     (when you logon, it'll prompt you for a phone number to call you  
back;
     yes, non-US phone numbers are supported)

I will make up a small number of slides that attempt to summarize all  
the arguments (on both sides) so far.  Hopefully, they can serve as a  
starting point for discussion.

Thanks; see you next Monday.

On May 1, 2009, at 1:09 PM, Roland Dreier (rdreier) wrote:

>  > You mentioned that doing this stuff is a choice; the choice that
>  > MPI's/ ULPs/applications therefore have is:
>  >
>  > - don't use registration caches/memory allocation hooking, have
>  > terrible performance
>  > - use registration caches/memory allocation hooking, have good
>  > performance
>
> I think it's a bit of a stretch to suggest that all or even most
> userspace RDMA applications have the same need for registration  
> caching
> as MPI.  In fact my feeling is that the fact that MPI must deal with
> RDMA to arbitrary memory allocated by an application out of MPI's
> control is the exception.  My most recent experience was with Cisco's
> RAB library, and in that case we simply designed the library so that  
> all
> RDMA was done to memory allocated by the library -- so no need for a
> registration cache, and in fact no need for registration in any fast
> path.  I suspect that the majority of code written to use RDMA  
> natively
> will be designed with similar properties.
>
> So this proposal is very much an MPI-specific interface.  Which  
> leads to
> my next point.  I have no doubt that the MPI community has a very good
> idea of a memory registration interface that would make MPI
> implementations simpler and more robust.  However I don't think  
> there's
> quite as much expertise about what the best way to implement such an
> interface is.
>
> My initial reaction is that I don't want to extend the kernel ABI with
> a set of new MPI-specific verbs if there's a way around it.  We've  
> been
> told over and over that the registration cache is complex and fragile
> code -- but moving complex and fragile code into the kernel doesn't
> magically make it any simpler or more robust, it just means that bugs
> now crash the whole system instead of just affecting one process.
>
> Now, of course MMU notifiers allow the kernel to know reliably when a
> process's page tables change, which means that all the complicated
> malloc hooking etc is not needed.  So that complexity is avoided in  
> the
> kernel.  But suppose I give userspace the same MMU notifier capability
> (eg I add a system call like "if any mappings in the virtual address
> range X ... Y change, then write a 1 to virtual address Z") -- then  
> what
> do I gain from having the rest of the registration caching in the
> kernel?  (And avoiding the duplication of caching code between  
> multiple
> MPI implementations is not an answer -- it's quite feasible to put the
> caching code into libibverbs if that's the best place for it)
>
>  - R.

-- 
Jeff Squyres
Cisco Systems