[ofa-general] Re: New proposal for memory management
Jeff Squyres
jsquyres at cisco.com
Wed Apr 29 05:15:57 PDT 2009
On Apr 29, 2009, at 12:03 AM, Jason Gunthorpe wrote:
> I've often wondered, wouldn't it just be fine for MPI if the entire
> process address space is kept pinned, registered and consistent with
> the HCA? The process would opt in to this behavior during MPI
> startup. Similar in spirit to the all physical memory registration the
> kernel can do.
>
An interesting idea. As I understand your idea, you essentially have
to pre-allocate memory to all MPI processes, registering all available
RAM. After thinking about this a little bit, I think there are still
a few problems, though:
- How much memory do you give to each MPI process? (phys_ram -
OS_overhead) / num_mpi_processes? What if each MPI process is not
created equal -- some need more RAM than others? Does each MPI
process need to know at the beginning of time the max memory that it
might need in the future? That could be quite difficult to know -- it
seems like an large new restriction to impose on users.
- As we head towards "manycore", the above problem will get [much]
worse, because I think we'll be heading back to the days of running
multiple different MPI jobs on a single machine. These jobs will have
no a priori knowledge of each other; if the 2nd MPI job launched on a
machine needs more than (phys_ram - OS_overhead) / num_processors, how
is that coordinated with the 1st MPI job that is already running on
the same machine?
- What about any other (non-MPI) process that needs to run? If all
memory after the OS is registered / unswappable / allocated to MPI
processes, then how do random processes get any memory to run? (e.g.,
shell scripts, daemons, ... etc.) If you simply leave X space un-
register specifically for such non-MPI processes, how do you decide
the value of X?
- The preallocation/registration of memory must happen pre-main()
because the first MPI function that is invoked (MPI_Init()) may not
occur until well after main(), and potentially after some calls to
malloc (etc.). For example, the following is a valid MPI program:
int main(...) {
int *a = malloc(...);
MPI_Init(...);
MPI_Send(a, ...);
...
}
Re-reading your brief text; I'm wondering if I missed the zen of what
you're trying to suggest...? If I'm off the mark, can you explain
more? Thanks.
--
Jeff Squyres
Cisco Systems
More information about the general
mailing list