[ewg] [PATCH] Report proper error code in [was: trying to reproduce the crash]

Olaf Kirch olaf.kirch at oracle.com
Mon Feb 4 05:49:50 PST 2008


I've been struggling with crashes in mthca_arbel_map_phys for a few days (triggered
by RDS), and I think I'm finally making some progress

mthca_fmr_alloc does this:

        if (mthca_is_memfree(dev)) {
                err = mthca_table_get(dev, dev->mr_table.mpt_table, key);
                if (err)
                        goto err_out_mpt_free;
	...
	}

	/* when we get here, err == 0 (at least for memfree cards) */
	mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
        if (IS_ERR(mr->mtt))
		goto err_out_table;

err_out_table:
       	/* clean up some */
        return err;

ie we set mr->mtt to some ERR_PTR(-whatever), and return success.

The same problem exists when mailbox allocation fails.

I fixed this, using the patch below. Now I'm making some progress:
First, the kernel reports:

RDS/IB: ib_alloc_fmr failed (err=-12)

which is good - now we get a decent error code instead of a crash.
A little later, it complains:

ib_mthca 0000:05:00.0: SW2HW_MPT returned status 0x0a

which doesn't sound quite as good... and things are very hosed
from that moment on; reloading ib_mthca seems to fix things, however.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
--------------- snip -------------------
From: Olaf Kirch <olaf.kirch at oracle.com>
Subject: Return proper error codes from mthca_fmr_alloc

If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc
would return 0 (success) no matter what. This leads to crashes a little
down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-ENOMEM).

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 drivers/infiniband/hw/mthca/mthca_mr.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- ofa_kernel-1.3.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -613,8 +613,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
 			sizeof *(mr->mem.tavor.mpt) * idx;
 
 	mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
-	if (IS_ERR(mr->mtt))
+	if (IS_ERR(mr->mtt)) {
+		err = PTR_ERR(mr->mtt);
 		goto err_out_table;
+	}
 
 	mtt_seg = mr->mtt->first_seg * MTHCA_MTT_SEG_SIZE;
 
@@ -627,8 +629,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
 		mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg;
 
 	mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL);
-	if (IS_ERR(mailbox))
+	if (IS_ERR(mailbox)) {
+		err = PTR_ERR(mailbox);
 		goto err_out_free_mtt;
+	}
 
 	mpt_entry = mailbox->buf;
 





More information about the ewg mailing list