[openib-general] Race in mthca_cmd_post()
John Partridge
johnip at sgi.com
Thu Oct 19 12:48:28 PDT 2006
Roland Dreier wrote:
> chas> i would guess the read to the mmio region is flushing the
> chas> writes to the config register but the read happens "too
> chas> soon" after those writes. on a more mundance computer, the
> chas> write/write/read probably wouldnt be batched together.
>
> config writes can't be posted though, so that doesn't make sense.
>
> - R.
Roland,
I think Chas is correct, here's why.
A trace comparison better explains what is going on.
First the FAILING case :-
23406: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23414: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23422: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23430: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23438: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23446: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2
23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = (0,2,0) WAIT = 1 (Error completion)
23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE)
We see here that a Config Write to Reg 01 (PCI_COMMAND) is issued across the
bus. We then see the Memory Read to 698 that goes across the bus before the
completion of the Config Write to Reg 01.
Now the trace of the WORKING (fixed) case :-
23406: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23414: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23422: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23430: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23438: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2
23446: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE)
23452: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2
23460: Split compl. Lower A = 00 D = 00000000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ)
23466: Memory Write A = 00280680 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2
23474: Memory Write A = 00280684 Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2
23482: Memory Write A = 00280688 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2
23490: Memory Write A = 0028068c Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2
23498: Memory Write A = 00280690 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2
23506: Memory Write A = 00280694 Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2
23514: Memory Write A = 00280698 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2
23522: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 2 WAIT = 2
23530: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 2 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ)
23536: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2
23544: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ)
23550: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2
23558: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ)
23564: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2
23572: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ)
Here we see completion of the Config Write to Reg 01 (PCI_COMMAND) before the
PIO Memory Read to 698 via the BAR.
The mmiowb() in my earlier patch was not actually flushing anything, but rather
introduced enough of a delay for the Config Write to complete before the
Memory Read to 698 in mthca_cmd_post_hcr(). This is why it appeared to fix the
MCA. This is not though, as I thought originally, a platform ordering problem.
In the failing case, the driver did the correct thing and "restore" the PCI
Config regs in the correct order, the Config Writes went across the PCI-X bus
in the correct order, but, the last Config Write did not make it to the HCA
across the other side of the card's bridge (PPB) before the Memory Read to 698
via the BAR. Therefor the read failed because HCA's BAR was not yet available
(because the Config Write had not completed the PCI_COMMAND yet).
The area of code that issues the Memory read (mthca_cmd_post() and in turn
mthca_cmd_post_hcr()) is protected by a mutex_lock, but mthca_reset() is not.
So, once we issue the last Config Write of PCI_COMMAND and leave mthca_reset()
there is nothing to prevent mthca_cmd_post_hcr() issuing the PIO Memory Read
of 698 (vi the BAR).
So, to make sure that the Config Write to Reg 01 completes before we exit mthca_reset()
I suggest we do a Config Read of PCI_COMMAND before we leave mthca_reset(), something
like this ? :-
--- openib-1.1-buildable-ORIG/drivers/infiniband/hw/mthca/mthca_reset.c 2006-09-20 07:19:24.000000000 -0500
+++ openib-1.1/drivers/infiniband/hw/mthca/mthca_reset.c 2006-10-19 13:55:04.292275707 -0500
@@ -281,6 +281,17 @@
goto out;
}
+ /*
+ * Perform a "flush" of the pci_write_config_dword() for PCI_COMMAND.
+ * The PCI_COMMAND to the HCA must complete before we exit mthca_reset()
+ * or any PIO Memory Reads via the BAR will fail at this point.
+ */
+ if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
+ err = -ENODEV;
+ mthca_err(mdev, "Couldn't access HCA memory after restoring, "
+ "aborting.\n");
+ }
+
out:
if (bridge)
pci_dev_put(bridge);
I have tested this and it does indeed stop the MCA. With the traces and
the my above text does this clear up the confusion ?
John
--
John Partridge
Silicon Graphics Inc
Tel: 651-683-3428
Vnet: 233-3428
E-Mail: johnip at sgi.com
More information about the general
mailing list