[ofa-general] New proposal for memory management

Thu Apr 30 10:51:32 PDT 2009

On 4/30/09 11:37 , "Woodruff, Robert J" <robert.j.woodruff at intel.com> wrote:

> Brian wrote,
>> In short, while I'd love to see tag matching, I'd rather make sure all the
>> other issues get solved properly first.  Otherwise, we've just added another
>> interface that drives me up the wall.
> 
> Well I agree that the other issues will need to be solved in any case, as
> even if we were able to get tag-matching into all the hardware, the lead
> time for this would be very long, so MPIs are going to have to deal with
> using the current verbs for the foreseeable future.
> 
> I am still not sure that having the OFA kernel developers create and manage
> a registration cache in the kernel is a good idea. As Alexander pointed out, I
> am not sure
> that a one size fits all memory registration cache can be developed that meets
> the needs of all applications.
> 
> Of coarse if memory registration was not so expensive and could simply be done
> for each operation without a big performance penalty, then we would not
> need caching at all. So another way to fix this is to just fix the hardware so
> memory registration can be done in the speed path, but this is unfortunately
> something that will likely not happen either.(

Well, here's the situation today.  Every MPI implementation out there has a
registration cache to be close on performance.  And every MPI implementation
uses one of a small number of hacks to figure out when memory is given back
to the OS, all of which are in at least one well-known, subtle way, broken.
So today every MPI implementation (which if marketing folk are to be
believed, is a large portion of IB's business) are doing dangerous things to
compete on performance.

All we want is *SOMETHING* we can do that we know is safe.  The registration
cache seems like the safest, but the notifier would be better than where we
are today if we can get the important race conditions out of it.  To answer
Alexander's question, if the kernel cache doesn't work for some particular
application, an MPI implementation can make the choice to go back to the
unsafe practices and run the cache entirely within the MPI implementation.
Personally, I don't think many would, but the option hadn't been removed.

The current state of the field is unacceptably stupid and I'm actually
amazed there's any resistance to fixing the problem.  We need both
short-term and long-term solutions (which might not be the same, if hardware
people want to tackle the problem properly), and they need to be usable by a
variety of ULPs.

Brian

--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories