[ofa-general] Memory registration redux

Roland Dreier rdreier at cisco.com
Thu May 7 14:58:50 PDT 2009


 > I don't know what the other MPI's do in this scenario, but here's what
 > OMPI will do:
 > 
 > 1. lookup 0x1000-0x3fff in the cache; not find any of it it, and
 > therefore register
 >    - add each page to our cache with a refcount of 1
 > 2. lookup 0x2000-0x2fff in the cache, find that all the pages are
 > already registered
 >    - refcount++ on each page in the cache
 > 3. when we go to dereg 0x1000-0x3fff
 >    - refcount-- on each page in the cache
 >    - since some pages in the range still have refcount>0, don't do
 > anything further
 > 
 > Specifically: the actual dereg of 0x1000-0x3fff is blocked on also
 > releasing 0x2000-0x2fff.

If everyone is doing this, how do you handle the case that Jason pointed
out, namely:

 * you register 0x1000 ... 0x3fff
 * you want to register 0x2000 ... 0x2fff and have a cache hit
 * you finish up with 0x1000 ... 0x3fff
 * app does something (which is valid since you finished up with the
   bigger range) that invalidates mapping 0x3000 ... 0x3fff (eg free()
   that leads to munmap() or whatever), and your hooks tell you so.
 * app reallocates a mapping in 0x3000 ... 0x3fff
 * you want to re-register 0x1000 ... 0x3fff -- but it has to be marked
   both invalid and in-use in the cache at this point !?

 - R.



More information about the general mailing list