[openib-general] Re: PATCH] mthca - command interface - revised

Grant Grundler iod00d at hp.com
Wed Feb 15 15:08:56 PST 2006


On Wed, Feb 15, 2006 at 01:34:00PM -0800, Roland Dreier wrote:
>     Michael> AFAIK, which of the two options gives better performance
>     Michael> might depend on the application and the specific system.
>     Michael> For now, Eli made the simpler option the default.
> 
> Have you seen cases where using the HCR is faster?  It seems that in
> both cases we are doing posted writes to PCI memory, except that the
> HCR case has to do at least one (slow) read to check the go bit.  The
> doorbell case does use more write barriers since all the writes have
> to be ordered, but I have a hard time believing that the write
> barriers are anywhere near as expensive as the read of the go bit.

me too.

AFAIK, the write barriers only guarantee the write has left the CPU,
is in flight, and subject to PCI ordering rules. The MMIO read is
going to cost 1000-3000 CPU cycles depending on chipset, CPU speed,
and which register it's reading from the device.

However, that doesn't mean all metrics are better just because
the CPU is more efficient. Forcing things down the PCI bus
will sometimes improve latency sensitive benchmarks.

grant



More information about the general mailing list