[ofa-general] mpi failures on large ia64/ofed/IB clusters

Roland Dreier rdreier at cisco.com
Sat Oct 6 13:40:10 PDT 2007


 > Roland, should I submit a proper patch, or do you want 
 > to take care of this? (And thanks alot, too!)

Thanks for testing... I can take care of this -- I just added the
patches below to my tree (since as far as I can see, mlx4 would be
susceptible to the same bug):

commit 66547550601a706e2b958ea351b34d8dee066b18
Author: Roland Dreier <rolandd at cisco.com>
Date:   Sat Oct 6 13:35:24 2007 -0700

    IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up
    
    Firmware commands are sent to the HCA by writing multiple words to a
    command register block.  Access to this block of registers is
    serialized with a mutex.  However, on large SGI systems, problems were
    seen with multiple CPUs issuing FW commands at the same time, because
    the writes to the register block may be reordered within the system
    interconnect and reach the HCA in a different order than they were
    issued (even with the mutex).  Fix this by adding an mmiowb() before
    dropping the mutex.
    
    Tested-by: Arthur Kepner <akepner at sgi.com>
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index acc9589..6966f94 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -290,6 +290,12 @@ static int mthca_cmd_post(struct mthca_dev *dev,
 		err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier,
 					 op_modifier, op, token, event);
 
+	/*
+	 * Make sure that our HCR writes don't get mixed in with
+	 * writes from another CPU starting a FW command.
+	 */
+	mmiowb();
+
 	mutex_unlock(&dev->cmd.hcr_mutex);
 	return err;
 }


commit 8c2348735c721eed6f08343eed851bfbec6e5a9a
Author: Roland Dreier <rolandd at cisco.com>
Date:   Sat Oct 6 13:39:38 2007 -0700

    mlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up
    
    Firmware commands are sent to the HCA by writing multiple words to a
    command register block.  Access to this block of registers is
    serialized with a mutex.  However, on large SGI systems writes to the
    register block may be reordered within the system interconnect and
    reach the HCA in a different order than they were issued (even with
    the mutex).  Fix this by adding an mmiowb() before dropping the mutex.
    
    This bug was observed with real workloads with the similar FW command
    code in the mthca driver, and adding the mmiowb() as in commit
    66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
    jumbled up") was confirmed to fix the problems, so we should add the
    same fix to mlx4.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/net/mlx4/cmd.c b/drivers/net/mlx4/cmd.c
index b540820..db49051 100644
--- a/drivers/net/mlx4/cmd.c
+++ b/drivers/net/mlx4/cmd.c
@@ -184,6 +184,13 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
 					       (event ? (1 << HCR_E_BIT) : 0)	|
 					       (op_modifier << HCR_OPMOD_SHIFT) |
 					       op),			  hcr + 6);
+
+	/*
+	 * Make sure that our HCR writes don't get mixed in with
+	 * writes from another CPU starting a FW command.
+	 */
+	mmiowb();
+
 	cmd->toggle = cmd->toggle ^ 1;
 
 	ret = 0;



More information about the general mailing list