[ofa-general] Re: [PATCH 08 of 11] anon-vma-rwsem

Linus Torvalds torvalds at linux-foundation.org
Wed May 14 11:27:14 PDT 2008



On Wed, 14 May 2008, Christoph Lameter wrote:
> 
> The problem is that the code in rmap.c try_to_umap() and friends loops 
> over reverse maps after taking a spinlock. The mm_struct is only known 
> after the rmap has been acccessed. This means *inside* the spinlock.

So you queue them. That's what we do with things like the dirty bit. We 
need to hold various spinlocks to look up pages, but then we can't 
actually call the filesystem with the spinlock held.

Converting a spinlock to a waiting lock for things like that is simply not 
acceptable. You have to work with the system.

Yeah, there's only a single bit worth of information on whether a page is 
dirty or not, so "queueing" that information is trivial (it's just the 
return value from "page_mkclean_file()". Some things are harder than 
others, and I suspect you need some kind of "gather" structure to queue up 
all the vma's that can be affected.

But it sounds like for the case of rmap, the approach of:

 - the page lock is the higher-level "sleeping lock" (which makes sense, 
   since this is very close to an IO event, and that is what the page lock 
   is generally used for)

   But hey, it could be anything else - maybe you have some other even 
   bigger lock to allow you to handle lots of pages in one go.

 - with that lock held, you do the whole rmap dance (which requires 
   spinlocks) and gather up the vma's and the struct mm's involved. 

 - outside the spinlocks you then do whatever it is you need to do.

This doesn't sound all that different from TLB shoot-down in SMP, and the 
"mmu_gather" structure. Now, admittedly we can do the TLB shoot-down while 
holding the spinlocks, but if we couldn't that's how we'd still do it: 
it would get more involved (because we'd need to guarantee that the gather 
can hold *all* the pages - right now we can just flush in the middle if we 
need to), but it wouldn't be all that fundamentally different.

And no, I really haven't even wanted to look at what XPMEM really needs to 
do, so maybe the above thing doesn't work for you, and you have other 
issues. I'm just pointing you in a general direction, not trying to say 
"this is exactly how to get there". 

		Linus



More information about the general mailing list