[ofa-general] Re: libmlx4 wc flash

Michael S. Tsirkin mst at dev.mellanox.co.il
Sat May 12 21:59:38 PDT 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: libmlx4 wc flash
> 
>  > > By the way, do you know what the best way to flush WC buffers for i386
>  > > is?  I know on x86-64 sfence is the way to go, and on ia64 I think we
>  > > want fc, but I'm not sure what the right thing is for for old 32-bit
>  > > processors.
>  > 
>  > Maybe just disable WC there?
> 
> I think we want to use write combining on 32-bit kernels or 32-bit
> userspace.  But I don't want to rely on SSE2 instructions for i386 binaries.
> 
>  > I don't think it works this way: if PAT is programmed to UC,
>  > I think you get UC access with movntq. No?
> 
> You're right -- I misremembered what the non-temporal stuff does, but
> I just checked and the manual says:
> 
>  "The memory type of the region being written to can override the
>   non-temporal hint, if the memory address specified for the
>   non-temporal store is in an uncacheable (UC) or write protected (WP)
>   memory region."

I just found this:
• Write Combining (WC) — System memory locations are not cached (as with
uncacheable memory) and coherency is not enforced by the processor’s bus
coherency protocol. Speculative reads are allowed. Writes may be delayed and
combined in the write combining buffer (WC buffer) to reduce memory accesses.
If the WC buffer is partially filled, the writes may be delayed until the next
occurrence of a serializing event; such as, an SFENCE or MFENCE instruction,
CPUID execution, a read or write to uncached memory, an interrupt occurrence,
or a LOCK instruction execution. This type of cachecontrol is appropriate for
video frame buffers, where the order of writes is unimportant as long as the
writes update memory so they can be seen on the graphics display. See Section
10.3.1, “Buffering of Write Combining Memory Locations,” for more information
about caching the WC memory type. This memory type is available in the Pentium
Pro and Pentium II processors by programming the MTRRs or in the Pentium III,
Pentium 4, and Intel Xeon processors by programming the MTRRs or by selecting
it through the PAT.


But in another place it says confusingly:


Software should access semaphores (shared memory used for signalling between
multiple processors) using identical addresses and operand lengths. For
example, if one processor accesses a semaphore using a word access, other
processors should not access the semaphore using a byte access. Do not use
semaphores on the WC memory type.

So, could we use a lock instructions to fence WC writes out?

-- 
MST



More information about the general mailing list