[ofa-general] Re: [PATCH 08 of 11] anon-vma-rwsem
Andrea Arcangeli
andrea at qumranet.com
Thu May 8 15:01:06 PDT 2008
On Thu, May 08, 2008 at 09:11:33AM -0700, Linus Torvalds wrote:
> Btw, this is an issue only on 32-bit x86, because on 64-bit one we already
> have the padding due to the alignment of the 64-bit pointers in the
> list_head (so there's already empty space there).
>
> On 32-bit, the alignment of list-head is obviously just 32 bits, so right
> now the structure is "perfectly packed" and doesn't have any empty space.
> But that's just because the spinlock is unnecessarily big.
>
> (Of course, if anybody really uses NR_CPUS >= 256 on 32-bit x86, then the
> structure really will grow. That's a very odd configuration, though, and
> not one I feel we really need to care about).
I see two ways to implement it:
1) use #ifdef and make it zero overhead for 64bit only without playing
any non obvious trick.
struct anon_vma {
spinlock_t lock;
#ifdef CONFIG_MMU_NOTIFIER
int global_mm_lock:1;
#endif
struct address_space {
spinlock_t private_lock;
#ifdef CONFIG_MMU_NOTIFIER
int global_mm_lock:1;
#endif
2) add a:
#define AS_GLOBAL_MM_LOCK (__GFP_BITS_SHIFT + 2) /* global_mm_locked */
and use address_space->flags with bitops
And as Andrew pointed me out by PM, for the anon_vma we can use the
LSB of the list.next/prev because the list can't be browsed when the
lock is taken, so taking the lock and then setting the bit and
clearing the bit before unlocking is safe. The LSB will always read 0
even if it's under list_add modification when the global spinlock isn't
taken. And after taking the anon_vma lock we can switch it the LSB
from 0 to 1 without races and the 1 will be protected by the
global spinlock.
The above solution is zero cost for 32bit too, so I prefer it.
So I now agree with you this is a great idea on how to remove sort()
and vmalloc and especially vfree without increasing the VM footprint.
I'll send an update with this for review very shortly and I hope this
goes in so KVM will be able to swap and do many other things very well
starting in 2.6.26.
Thanks a lot,
Andrea
More information about the general
mailing list