[ofa-general] Re: libmlx4 wc flash

Michael S. Tsirkin mst at dev.mellanox.co.il
Tue May 15 13:11:05 PDT 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: libmlx4 wc flash
> 
>  >  		memcpy(ctx->bf_page + ctx->bf_offset, ctrl, align(size * 16, 64));
> 
> By the way, why are we aligning the size of the WQE we copy to 64
> bytes?  I copied this from Jack's code but I don't see anything that
> requires it.  We already have:
> 
> 	if (nreq == 1 && inl && size > 1 && size < ctx->bf_buf_size / 16) {
> 
> so we will always have at least 32 bytes to copy.

This is an intel-specific optimization (for new Intel processors):

Once the processor has started to evict data from the WC buffer into system
memory, it will make a bus-transaction style decision based on how much of the
buffer contains valid data. If the buffer is full (for example, all bytes are
valid) the processor will execute a burst-write transaction on the bus that
will result in all 32 bytes (P6 family processors) or 64 bytes (Pentium 4 and
Intel Xeon processor) being transmitted on the data bus in a single burst
transaction. If one or more of the WC buffer’s bytes are invalid (for example,
have not been written by software) then the processor will transmit the data to
memory using “partial write” transactions (one chunk at a time, where a “chunk”
is 8 bytes).

in other words, it is important to fill the full WC buffer to get good speed.

Need to check which sizes are good for AMD, PPC, ...

-- 
MST



More information about the general mailing list