[ewg] [PATCH] Report proper error code in [was: trying to reproduce the crash]
Olaf Kirch
olaf.kirch at oracle.com
Mon Feb 4 05:49:50 PST 2008
I've been struggling with crashes in mthca_arbel_map_phys for a few days (triggered
by RDS), and I think I'm finally making some progress
mthca_fmr_alloc does this:
if (mthca_is_memfree(dev)) {
err = mthca_table_get(dev, dev->mr_table.mpt_table, key);
if (err)
goto err_out_mpt_free;
...
}
/* when we get here, err == 0 (at least for memfree cards) */
mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
if (IS_ERR(mr->mtt))
goto err_out_table;
err_out_table:
/* clean up some */
return err;
ie we set mr->mtt to some ERR_PTR(-whatever), and return success.
The same problem exists when mailbox allocation fails.
I fixed this, using the patch below. Now I'm making some progress:
First, the kernel reports:
RDS/IB: ib_alloc_fmr failed (err=-12)
which is good - now we get a decent error code instead of a crash.
A little later, it complains:
ib_mthca 0000:05:00.0: SW2HW_MPT returned status 0x0a
which doesn't sound quite as good... and things are very hosed
from that moment on; reloading ib_mthca seems to fix things, however.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
--------------- snip -------------------
From: Olaf Kirch <olaf.kirch at oracle.com>
Subject: Return proper error codes from mthca_fmr_alloc
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc
would return 0 (success) no matter what. This leads to crashes a little
down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-ENOMEM).
Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
drivers/infiniband/hw/mthca/mthca_mr.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
Index: ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- ofa_kernel-1.3.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -613,8 +613,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
sizeof *(mr->mem.tavor.mpt) * idx;
mr->mtt = __mthca_alloc_mtt(dev, list_len, dev->mr_table.fmr_mtt_buddy);
- if (IS_ERR(mr->mtt))
+ if (IS_ERR(mr->mtt)) {
+ err = PTR_ERR(mr->mtt);
goto err_out_table;
+ }
mtt_seg = mr->mtt->first_seg * MTHCA_MTT_SEG_SIZE;
@@ -627,8 +629,10 @@ int mthca_fmr_alloc(struct mthca_dev *de
mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg;
mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL);
- if (IS_ERR(mailbox))
+ if (IS_ERR(mailbox)) {
+ err = PTR_ERR(mailbox);
goto err_out_free_mtt;
+ }
mpt_entry = mailbox->buf;
More information about the ewg
mailing list