[ofa-general] Re: [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4)

Fri Mar 7 12:00:26 PST 2008

On Fri, 7 Mar 2008, Andrea Arcangeli wrote:

> This combines the non-sleep-capable RCU locking of #v9 with a seqlock
> so the mmu notifier fast path will require zero cacheline
> writes/bouncing while still providing mmu_notifier_unregister and
> allowing to schedule inside the mmu notifier methods. If we drop
> mmu_notifier_unregister we can as well drop all seqlock and
> rcu_read_lock()s. But this locking scheme combination is sexy enough
> and 100% scalable (the mmu_notifier_list cacheline will be preloaded
> anyway and that will most certainly include the sequence number value
> in l1 for free even in Christoph's NUMA systems) so IMHO it worth to
> keep mmu_notifier_unregister.

Well its adds lots of processing. Not sure if its really worth it. Seems 
that this scheme cannot work since the existence of the structure passed 
to the callbacks is not guaranteed since the RCU locks are not held. You 
need some kind of a refcount to give the existence guarantee.

> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -20,7 +20,9 @@ void __mmu_notifier_release(struct mm_st
>  void __mmu_notifier_release(struct mm_struct *mm)
>  {
>  	struct mmu_notifier *mn;
> +	unsigned seq;
>  
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
>  	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
>  		mn = hlist_entry(mm->mmu_notifier_list.first,
>  				 struct mmu_notifier,
> @@ -28,6 +30,7 @@ void __mmu_notifier_release(struct mm_st
>  		hlist_del(&mn->hlist);
>  		if (mn->ops->release)
>  			mn->ops->release(mn, mm);
> +		BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq));
>  	}
>  }

So this is only for sanity checking? The BUG_ON detects concurrent 
operations that should not happen? Need a comment here.

> @@ -42,11 +45,19 @@ int __mmu_notifier_clear_flush_young(str
>  	struct mmu_notifier *mn;
>  	struct hlist_node *n;
>  	int young = 0;
> +	unsigned seq;
>  
>  	rcu_read_lock();
> +restart:
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
>  	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> -		if (mn->ops->clear_flush_young)
> +		if (mn->ops->clear_flush_young) {
> +			rcu_read_unlock();
>  			young |= mn->ops->clear_flush_young(mn, mm, address);
> +			rcu_read_lock();
> +		}
> +		if (read_seqretry(&mm->mmu_notifier_lock, seq))
> +			goto restart;

Great innovative idea of the seqlock for versioning checks.

>  	}
>  	rcu_read_unlock();
>  

Well that gets pretty sophisticated here. If you drop the rcu lock then 
the entity pointed to by mn can go away right? So how can you pass that 
structure to clear_flush_young? What is guaranteeing the existence of the 
structure?

> @@ -58,11 +69,19 @@ void __mmu_notifier_invalidate_page(stru
>  {
>  	struct mmu_notifier *mn;
>  	struct hlist_node *n;
> +	unsigned seq;
>  
>  	rcu_read_lock();
> +restart:
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
>  	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> -		if (mn->ops->invalidate_page)
> +		if (mn->ops->invalidate_page) {
> +			rcu_read_unlock();
>  			mn->ops->invalidate_page(mn, mm, address);

Ditto structure can vanish since no existence guarantee exists.