[openib-general] Re: RDMA memory registration

Fri Apr 29 14:42:55 PDT 2005

oops, hit the send to soon. Finishing the response...

On 4/29/05, Caitlin Bestler <caitlin.bestler at gmail.com> wrote:
> On 4/29/05, Roland Dreier <roland at topspin.com> wrote:
> >     Bill> I'm very confused at this point. Can you briefly explain how
> >     Bill> this works, or point me to a description? I don't see how
> >     Bill> you could do user level I/O without registering the memory
> >     Bill> with the hardware. I'm especially confused by the comment
> >     Bill> (may not have been yours) that the memory doesn't have to be
> >     Bill> pinned.  -- Bill Jordan InfiniCon Systems
> >
> > You add a hook to the kernel so it tells you if a page is about to be
> > paged out or otherwise move.  Then you set a bit in the adapter's page
> > table so that it won't try to access that page without telling you.
> > If the adapter asks for the page, you get the kernel to fault the page
> > in and program the new physical mapping in the adapter.
> >
> 
> Yes, and you could even have a system that was capable of doing
> DMA to a user virtual map (in fact some minis back around 1980
> had exactly that capability).
> 
> But there are *two* issues involved here:
> 
>     One is that the RDMA hardware, however it is marketed, essentially
>     needs to act as an MMU. That means that it has to be synchronized
>     with normal MMU. The traditional sledge-hammer approach to
> 
    "synchronizing" is to require that the mapping be frozen. You *could*
    define a method that attempts to be more dynamic in this synchronization,
    but since it is an ex post facto mechanism that must work with multiple
    hardware cards it needs to be defined recognizing that it is not
instantaneous.
    It is virtually the same problem as memory suspend in general, basically
   the RDMA Hardware's MMU is not making calculations for each and every
   access to the host bus.

   Secondly there is the problem that an advertised buffer is implicitly a 
   promise to the the peer that the buffer is available. Using RNRs (or dropping
   TCP segments for iWARP) while paging an image from disk is just not
   playing fair. No host should advertise 20 GB of buffers to its peer when it
   only has 2 GBs of physical memory backing it up. When an application
   registers memory it believes it has permission from the OS to advertise
   buffers within it. RNRs are appropriate to move memory around, not to
   allow a host to overadvertise.