[ofa-general] Memory registration redux

Roland Dreier rdreier at cisco.com
Wed May 6 14:56:25 PDT 2009


 > Yuk, doesn't this problem pretty much doom this method entirely? You
 > can't tear down the entire registration of 0x1000 ... 0x3fff if the app
 > does something to change 0x2000 .. 0x2fff because it may have active
 > RDMAs going on in 0x1000 ... 0x1fff.

Yes, I guess if we try to reuse registrations like this then we run into
trouble.  I think your example points to a problem if an app registers
0x1000...0x3fff and then we reuse that registration for 0x2000...0x2fff
and also for 0x1000...0x1fff, and then the app unregisters 0x1000...0x3fff.

But we can get around this just by not ever reusing registrations that
way -- only treat something as a cache hit if it matches the start and
length exactly.

 > What about a slightly different twist.. Instead of trying to make
 > everything synchronous in the mmu_notifier, just have a counter mapped
 > to user space. Increment the counter whenever the mms change from the
 > notifier. Pin the user page that contains the single counter upon
 > starting the process so access is lockless.
 > 
 > In user space, check the counter before every cache lookup and if it
 > has changed call back into the kernel to resynchronize the MR tables in
 > the HCA to the current VM.
 > 
 > Avoids the locking and racing problems, keeps the fast path in the
 > user space and avoids the above question about how to deal with
 > arbitrary actions?

I like the simplicity of the fast path.  But it seems the slow path
would be hard -- how exactly did you envision resynchronizing the MR
tables?  (Considering that RDMAs might be in flight for MRs that weren't
changed by the MM operations)

 - R.



More information about the general mailing list