[ofa-general] Re: [PATCH 08 of 11] anon-vma-rwsem

Nick Piggin npiggin at suse.de
Tue May 13 21:11:22 PDT 2008


On Tue, May 13, 2008 at 10:32:38AM -0500, Robin Holt wrote:
> On Tue, May 13, 2008 at 10:06:44PM +1000, Nick Piggin wrote:
> > On Thursday 08 May 2008 10:38, Robin Holt wrote:
> > > In order to invalidate the remote page table entries, we need to message
> > > (uses XPC) to the remote side.  The remote side needs to acquire the
> > > importing process's mmap_sem and call zap_page_range().  Between the
> > > messaging and the acquiring a sleeping lock, I would argue this will
> > > require sleeping locks in the path prior to the mmu_notifier invalidate_*
> > > callouts().
> > 
> > Why do you need to take mmap_sem in order to shoot down pagetables of
> > the process? It would be nice if this can just be done without
> > sleeping.
> 
> We are trying to shoot down page tables of a different process running
> on a different instance of Linux running on Numa-link connected portions
> of the same machine.

Right. You can zap page tables without sleeping, if you're careful. I
don't know that we quite do that for anonymous pages at the moment, but it
should be possible with a bit of thought, I believe.

 
> The messaging is clearly going to require sleeping.  Are you suggesting
> we need to rework XPC communications to not require sleeping?  I think
> that is going to be impossible since the transfer engine requires a
> sleeping context.

I guess that you have found a way to perform TLB flushing within coherent
domains over the numalink interconnect without sleeping. I'm sure it would
be possible to send similar messages between non coherent domains.

So yes, I'd much rather rework such highly specialized system to fit in
closer with Linux than rework Linux to fit with these machines (and
apparently slow everyone else down).

 
> Additionally, the call to zap_page_range expects to have the mmap_sem
> held.  I suppose we could use something other than zap_page_range and
> atomically clear the process page tables.

zap_page_range does not expect to have mmap_sem held. I think for anon
pages it is always called with mmap_sem, however try_to_unmap_anon is
not (although it expects page lock to be held, I think we should be able
to avoid that).


>  Doing that will not alleviate
> the need to sleep for the messaging to the other partitions.

No, but I'd venture to guess that is not impossible to implement even
on your current hardware (maybe a firmware update is needed)?




More information about the general mailing list