[openib-general] Re: RDMA memory registration

Caitlin Bestler caitlin.bestler at gmail.com
Tue May 3 09:56:30 PDT 2005


An ex post facto notification of a PTE change would enable the RDMA
Device driver to know when a Memory Region had been invalidated
so that it could probably declare an access violation and tear all 
the connections using it down.

But if the intent is to allow it to migrate the memory region to the
new mapping it would need a more synchronized notice. It needs
to be told of a *pending* change, so that it can indicate when it
has completed any data movements based on the old data.
It can then use the new data. This has generally been discussed
as a two part interface: suspend (to request that the old mapping
no longer be used) and resume (to resume usage of the mapping
with the new values), and it is generally done at a Memory Region
scope rather than on a per PTE basis.

RDMA has strict ordering requirements. In particular, completing
a receive work request represents a guarnatee to the consumer
that the prior writes have been updated in its buffer. With an
unsynchronized notice that "PTE entry X has been changed"
I don't see how it can fulfill those semantics. It cannot know if
portions of an RDMA Write were placed to the old physical
location, and therefore it cannot know that the entire RDMA
Write payload will be in user memory at the anticipated locations
when it generates the work completion. If it cannot make that
guarantee it is obligated to terminate the connection.


On 5/3/05, David Addison <addy at quadrics.com> wrote:
> Ronald G. Minnich wrote:
> >
> > On Fri, 29 Apr 2005, Greg Lindahl wrote:
> >
> >>It doesn't imply that there's an MMU, either. I know that Myricom uses a
> >>little lookup routine in software on their nic, which most people
> >>wouldn't call an MMU. I don't know what Mellanox does for this, they
> >>don't talk much about what's hardware and what's software on their nic.
> >>I think Quadrics actually uses the TLB of their risc cpu on their nic
> >>for this lookup, but that's just a guess.
> >
> > but only quadrics rewrites the mm layer code ..
> >
> >
> Hi Ron,
> as our recent IOPROC patch on lkml shows, it's not that invasive. There
> are just 24 hooks added to the Linux VM code paths - which we have been able to
> maintain outside the mainline tree for many years now.
> As these hooks only need to synchronise the Elan's MMU state with that of the
> CPU, the device drivers calls don't change the Linux MM behaviour.
> 
> We believe the IOPROC patch is generic and powerful and would allow other
> RDMA NICs to solve the page registration problems in a different manner.
> For NICs which require page registration, new VM hooks can be used to avoid
> pages being unloaded whilst DMAs are active. Our latest cut of the IOPROC patch
> has such a hook.
> 
> Cheers
> Addy.
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list