[ofa-general] mpi failures on large ia64/ofed/IB clusters
Roland Dreier
rdreier at cisco.com
Sat Oct 6 13:40:10 PDT 2007
> Roland, should I submit a proper patch, or do you want
> to take care of this? (And thanks alot, too!)
Thanks for testing... I can take care of this -- I just added the
patches below to my tree (since as far as I can see, mlx4 would be
susceptible to the same bug):
commit 66547550601a706e2b958ea351b34d8dee066b18
Author: Roland Dreier <rolandd at cisco.com>
Date: Sat Oct 6 13:35:24 2007 -0700
IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up
Firmware commands are sent to the HCA by writing multiple words to a
command register block. Access to this block of registers is
serialized with a mutex. However, on large SGI systems, problems were
seen with multiple CPUs issuing FW commands at the same time, because
the writes to the register block may be reordered within the system
interconnect and reach the HCA in a different order than they were
issued (even with the mutex). Fix this by adding an mmiowb() before
dropping the mutex.
Tested-by: Arthur Kepner <akepner at sgi.com>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index acc9589..6966f94 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -290,6 +290,12 @@ static int mthca_cmd_post(struct mthca_dev *dev,
err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
op_modifier, op, token, event);
+ /*
+ * Make sure that our HCR writes don't get mixed in with
+ * writes from another CPU starting a FW command.
+ */
+ mmiowb();
+
mutex_unlock(&dev->cmd.hcr_mutex);
return err;
}
commit 8c2348735c721eed6f08343eed851bfbec6e5a9a
Author: Roland Dreier <rolandd at cisco.com>
Date: Sat Oct 6 13:39:38 2007 -0700
mlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up
Firmware commands are sent to the HCA by writing multiple words to a
command register block. Access to this block of registers is
serialized with a mutex. However, on large SGI systems writes to the
register block may be reordered within the system interconnect and
reach the HCA in a different order than they were issued (even with
the mutex). Fix this by adding an mmiowb() before dropping the mutex.
This bug was observed with real workloads with the similar FW command
code in the mthca driver, and adding the mmiowb() as in commit
66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
jumbled up") was confirmed to fix the problems, so we should add the
same fix to mlx4.
Signed-off-by: Roland Dreier <rolandd at cisco.com>
diff --git a/drivers/net/mlx4/cmd.c b/drivers/net/mlx4/cmd.c
index b540820..db49051 100644
--- a/drivers/net/mlx4/cmd.c
+++ b/drivers/net/mlx4/cmd.c
@@ -184,6 +184,13 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
(event ? (1 << HCR_E_BIT) : 0) |
(op_modifier << HCR_OPMOD_SHIFT) |
op), hcr + 6);
+
+ /*
+ * Make sure that our HCR writes don't get mixed in with
+ * writes from another CPU starting a FW command.
+ */
+ mmiowb();
+
cmd->toggle = cmd->toggle ^ 1;
ret = 0;
More information about the general
mailing list