[openib-general] [PATCH] mthca updates (2.6.8 dependent)

Dror Goldenberg gdror at mellanox.co.il
Wed Sep 1 14:59:42 PDT 2004



> -----Original Message-----
> From: Grant Grundler [mailto:iod00d at hp.com] 
> Sent: Tuesday, August 31, 2004 8:23 AM
> 
> On Mon, Aug 30, 2004 at 09:10:35PM -0700, Roland Dreier wrote:
> > The Intel E7500 Xeon chipset PCI bridge datasheet does say that an 
> > interrupt message causes the bridge to flush its write buffers to 
> > preserve precisely this ordering, but I don't know whether everyone 
> > followed or will follow this example (I'm sure we'll see 
> many more MSI 
> > implementations on PCI Express).
> 
> Maybe the flush is required because of DMA write coalescing 
> in the E7500 chipset?
> 
> Ie any DMA writes which can't be coalesced will cause this 
> kind of a flushing behavior. I don't really know since I 
> don't how E7500 handles cache coherency.
> 
> And yes, I'm certain some chipsets get DMA write coalescing 
> wrong. Look at drivers/net/tg3.c and search for 
> TG3_FLAG_MBOX_WRITE_REORDER in tg3_get_invariants().
> 
> grant

I think that one of the most important intentions of the MSI was to
save the need to perform the MMIO read which is expensive in CPU
time. I think that for mainstream systems you can assume that there
is ordering between MSI and other DMA writes. If some systems
have related bugs, then in this case they can add a workaround (which
may be either to perform a MMIO read, or to use good old regular
interrupts.
Another thing worth mentioning, is that the InfiniHost will fire an
interrupt
again if the interrupt handler didn't clean up the event queue. I.e. in the 
rare (or non existing case) of the MSI bypassing the preceding DMA 
write, if the driver doesn't see the EQE in memory, at the time it rearms
the EQ, it'll get an immediate interrupt. So, you should be safe anyway...

BTW, here's an excerpt from the PCI spec* explaining the the driver need
not do any MMIO read from the device in order to ensure that the data is 
in memory at the time the MSI is received:

"An MSI or MSI-X message, by virtue of being a posted memory write (PMW)
transaction, is prohibited by PCI ordering rules from passing PMW
transactions sent earlier by the function. The system must guarantee that an
interrupt service routine invoked as a result of a given message will
observe any updates performed by PMW transactions arriving prior to that
message. Thus, the interrupt service routine of a device driver is not
required to read from a device register in order to ensure data consistency
with previous PMW transactions. However, if multiple MSI-X Table entries
share the same vector, the interrupt service routine may need to read from
some device specific register to determine which interrupt sources need
servicing."

* Section 6.8.3.6 at the end in
http://www.pcisig.com/specifications/conventional/msi-x_ecn.pdf

-Dror
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20040902/e79ec388/attachment.html>


More information about the general mailing list