[ofa-general] Re: New proposal for memory management

John A. Gregor john.gregor at qlogic.com
Wed Apr 29 18:28:06 PDT 2009


Jeff Squyres <jsquyres at cisco.com> wrote:

> On Apr 29, 2009, at 5:04 PM, Ralph Campbell wrote:
> > It seems to me that this is mostly an issue for rendezvous sends.
> > Eager sends can use a pool of preregistered memory which are
> > reused as data is copied from the buffer and ibv_post_recv()'ed.
>
> Yes, exactly!

Another MPI/Verbs neophyte chiming in...

So, for a rendezvous, I imagine there's an exchange that looks vaguely
like:

        A               B
        |               |
        +-- RTS ------->|
        |               |
        |<---------CTS--+
        |               |
        +-- DATA ------>|
        +-- DATA ------>|
        +-- DATA ------>|
        +-- DATA ------>|
        :               :

And the first critical path for B is to receive the RTS and turn it
around into a CTS as quickly as possible.

It seems like all you need at the time of the CTS is a physical address
(or set of them) to program into your hardware and set up the mapping
from memory region to chip resources.

While the CTS is flying back to the requester, there is time for playing
with mappings and other tricks - anything that doesn't invalidate the
physical mappings in the hardware.

So, how about this:

Maintain a pool of pre-pinned pages.

When an RTS comes in, use one of the pre-pinned buffers as the place the
DATA will land.  Set up the remaining hw context to enable receipt into
the page(s) and fire back your CTS.

While the CTS is in flight and the DATA is streaming back (and you
therefore have a couple microseconds to play with), remap the virt-to-phys
mapping of the application so that the original virtual address now
points at the pre-pinned page.

If the transfer didn't completely fill a page, provide an option to copy
into the new page any memory that wasn't overwritten by the transfer.

Add the original physical page into the pool of pages available as
a buffer.  The app goes its merry way using the new physical page.

Of course, this does presuppose a system call that looks like
phys_swap(void *a, void *b) that would atomically swap the physical pages
backing virtual address a and b.  And I know some architectures have
funny page-coloring issues wrt what virtual addresses can map to what
physical adddresses.  So it might have to be a pool per color for those.

Anyway, just a thought.

-John Gregor



More information about the general mailing list