[ofa-general] [PATCH for 2.6.28] core: fix memory leak in XRC userspace cleanup.

Jack Morgenstein jackm at dev.mellanox.co.il
Tue Aug 12 08:31:03 PDT 2008


core: fix memory leak in XRC userspace cleanup.
    
When userspace invoked close_xrc_domain, the domain was deleted from
the active-domains list for that process -- even if the process had
active QPs and SRQs using that domain.
    
The domain, however was not destroyed, and its resources were still
allocated.
    
As a result of deleting the domain from the active domains for the
process, however, no attempt would be made to destroy the domain
during cleanup. If it was the last process using that domain,
the domain's resources remained allocated forever (memory/resource leak).
    
The fix is to avoid deleting the domain from the active list if there
are outstanding resources (QPs or SRQs) in the process which use it
(same as is done for XRC_RCV qp's and xrc qp registration).
    
Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
---
Roland, this is a provisional fix for the memory leak that I committed to the
ofed 1.4 git tree.  A better fix is to add a usecnt field to ib_xrcd_uobject,
and increment/decrement it when creating/destroying XRC qp's and XRC srq's.

To do this, however, I need to save the xrc domain handle in the
ib_uqp_object for use when destroying the QP, to decrement the xrc domain
use count (no problem -- such an object exists).
However, I need to do the same thing for SRQs, which generates a
"falling domino" set of changes. There is no separate srq uobject -- srq
uses ib_uevent_object, and I was reluctant to add an xrc domain handle to
ib_uevent_object. Furthermore, since there is no separate destroy_xrc_srq()
function, I need to know when we are destroying an xrc or non-xrc srq.
(I could possibly use an xrc handle = -1 to indicate that this is not an
xrc srq).

I think the correct thing to do is to define a ib_srq_uobject to replace using
the ib_uevent_object for srq's, and deal with the various changes that entails.

I am uncomfortable adding a "falling dominoes" change to the core to fix this
bug just before the beta. However, since I know you're thinking about some
major changes to XRC (migrating functionality from mlx4 to the core), I'm just
alerting you to the leak problem.

diff --git a/kernel_patches/fixes/core_0160_xrc_fix_memleak.patch b/kernel_patches/fixes/core_0160_xrc_fix_memleak.patch
new file mode 100644
index 0000000..80a51c3
--- /dev/null
+++ b/kernel_patches/fixes/core_0160_xrc_fix_memleak.patch
@@ -0,0 +1,65 @@
+core: fix memory leak in XRC userspace cleanup.
+
+When userspace invoked close_xrc_domain, the domain was deleted from
+the active-domains list for that process -- even if the process had
+active QPs and SRQs using that domain.
+
+The domain, however was not destroyed, and its resources were still
+allocated.
+
+As a result of deleting the domain from the active domains for the
+process, however, no attempt would be made to destroy the domain
+during cleanup. If it was the last process using that domain,
+the domain's resources remained allocated forever (memory/resource leak).
+
+The fix is to avoid deleting the domain from the active list if there
+are outstanding resources (QPs or SRQs) in the process which use it
+(same as is done for XRC_RCV qp's and xrc qp registration).
+
+Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
+
+Index: ofed_1_4/drivers/infiniband/core/uverbs_cmd.c
+===================================================================
+--- ofed_1_4.orig/drivers/infiniband/core/uverbs_cmd.c	2008-08-07 19:24:52.000000000 +0300
++++ ofed_1_4/drivers/infiniband/core/uverbs_cmd.c	2008-08-07 19:26:34.000000000 +0300
+@@ -2567,7 +2567,7 @@ ssize_t ib_uverbs_close_xrc_domain(struc
+ 				   int out_len)
+ {
+ 	struct ib_uverbs_close_xrc_domain cmd;
+-	struct ib_uobject *uobj;
++	struct ib_uobject *uobj, *t_uobj;
+ 	struct ib_uxrcd_object *xrcd_uobj;
+ 	struct ib_xrcd *xrcd = NULL;
+ 	struct inode *inode = NULL;
+@@ -2584,6 +2584,31 @@ ssize_t ib_uverbs_close_xrc_domain(struc
+ 		goto err_unlock_mutex;
+ 	}
+ 
++	mutex_lock(&file->mutex);
++	if (!ret) {
++		list_for_each_entry(t_uobj, &file->ucontext->qp_list, list) {
++			struct ib_qp *qp = t_uobj->object;
++			if (qp->xrcd && qp->xrcd == uobj->object) {
++				ret = -EBUSY;
++				break;
++			}
++		}
++	}
++	if (!ret) {
++		list_for_each_entry(t_uobj, &file->ucontext->srq_list, list) {
++			struct ib_srq *srq = t_uobj->object;
++			if (srq->xrcd && srq->xrcd == uobj->object) {
++				ret = -EBUSY;
++				break;
++			}
++		}
++	}
++	mutex_unlock(&file->mutex);
++	if (ret) {
++		put_uobj_write(uobj);
++		goto err_unlock_mutex;
++	}
++
+ 	xrcd_uobj = container_of(uobj, struct ib_uxrcd_object, uobject);
+ 	if (!list_empty(&xrcd_uobj->xrc_reg_qp_list)) {
+ 		ret = -EBUSY;



More information about the general mailing list