[openib-general] Re: RDMA memory registration
Caitlin Bestler
caitlin.bestler at gmail.com
Fri Apr 29 14:42:55 PDT 2005
oops, hit the send to soon. Finishing the response...
On 4/29/05, Caitlin Bestler <caitlin.bestler at gmail.com> wrote:
> On 4/29/05, Roland Dreier <roland at topspin.com> wrote:
> > Bill> I'm very confused at this point. Can you briefly explain how
> > Bill> this works, or point me to a description? I don't see how
> > Bill> you could do user level I/O without registering the memory
> > Bill> with the hardware. I'm especially confused by the comment
> > Bill> (may not have been yours) that the memory doesn't have to be
> > Bill> pinned. -- Bill Jordan InfiniCon Systems
> >
> > You add a hook to the kernel so it tells you if a page is about to be
> > paged out or otherwise move. Then you set a bit in the adapter's page
> > table so that it won't try to access that page without telling you.
> > If the adapter asks for the page, you get the kernel to fault the page
> > in and program the new physical mapping in the adapter.
> >
>
> Yes, and you could even have a system that was capable of doing
> DMA to a user virtual map (in fact some minis back around 1980
> had exactly that capability).
>
> But there are *two* issues involved here:
>
> One is that the RDMA hardware, however it is marketed, essentially
> needs to act as an MMU. That means that it has to be synchronized
> with normal MMU. The traditional sledge-hammer approach to
>
"synchronizing" is to require that the mapping be frozen. You *could*
define a method that attempts to be more dynamic in this synchronization,
but since it is an ex post facto mechanism that must work with multiple
hardware cards it needs to be defined recognizing that it is not
instantaneous.
It is virtually the same problem as memory suspend in general, basically
the RDMA Hardware's MMU is not making calculations for each and every
access to the host bus.
Secondly there is the problem that an advertised buffer is implicitly a
promise to the the peer that the buffer is available. Using RNRs (or dropping
TCP segments for iWARP) while paging an image from disk is just not
playing fair. No host should advertise 20 GB of buffers to its peer when it
only has 2 GBs of physical memory backing it up. When an application
registers memory it believes it has permission from the OS to advertise
buffers within it. RNRs are appropriate to move memory around, not to
allow a host to overadvertise.
More information about the general
mailing list