[openib-general] Re: [PATCH] process locked in D state.

Gleb Natapov glebn at voltaire.com
Tue Jun 28 00:50:57 PDT 2005


On Tue, Jun 28, 2005 at 10:22:22AM +0300, Michael S. Tsirkin wrote:
> Quoting r. Roland Dreier <rolandd at cisco.com>:
> > Subject: Re: [PATCH] process locked in D state.
> > 
> > Something like this should work...
> 
> Note that there's a window when close() has returned but
> vm_locked isnt updated yet.
This is the case only if close() can return before workqueues are ran.
Is this possible? If yes perhaps it is better to use tascklets.

> Actually, a process that doesnt close memory regions before
> closing the device will leak (virtual) memory anyway.
This is not the kernel problem.

> 
> The only interesting case is of the process exiting, in which case mm is
> NULL.
> 
> So I propose to kill all this code.
By your patch you completely removed locked_vm accounting on deregister
path. I don't think this was your intention.

I think kernel should cleanup everything it can after close().
If user call ibv_close_device() he expects that the process is in
the state such as ibv_open_device() was never called in the first place.

> 
> Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> 
> Index: core/uverbs_mem.c
> ===================================================================
> --- core/uverbs_mem.c	(revision 2724)
> +++ core/uverbs_mem.c	(working copy)
> @@ -180,40 +180,7 @@ static void ib_umem_account(void *work_p
>  
>  void ib_umem_release(struct ib_device *dev, struct ib_umem *umem)
>  {
> -	struct mm_struct *mm;
> -
> +	/* A well-behaved application will close regions before
> +	 * closing the file. Thus we dont have to update mm->locked_vm here. */
>  	__ib_umem_release(dev, umem, 1);
> -
> -	mm = get_task_mm(current);
> -	if (mm) {
> -		/*
> -		 * We may be called with the mm's mmap_sem already
> -		 * held.  This can happen when a userspace munmap() is
> -		 * the call that drops the last reference to our file
> -		 * and calls our release method.  If there are memory
> -		 * regions to destroy, we'll end up here and not be
> -		 * able to take the mmap_sem.
> -		 *
> -		 * To handle this, we try to grab the mmap_sem, and if
> -		 * we can't get it immediately, we defer the
> -		 * accounting to the system workqueue.
> -		 */
> -		if (down_write_trylock(&mm->mmap_sem)) {
> -			mm->locked_vm -= PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
> -			up_write(&mm->mmap_sem);
> -			mmput(mm);
> -		} else {
> -			struct ib_umem_account_work *work;
> -
> -			work = kmalloc(sizeof *work, GFP_KERNEL);
> -			if (!work)
> -				return;
> -
> -			INIT_WORK(&work->work, ib_umem_account, work);
> -			work->mm   = mm;
> -			work->diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
> -
> -			schedule_work(&work->work);
> -		}
> -	}
>  }
> 
> -- 
> MST

--
			Gleb.



More information about the general mailing list