[ofa-general] SDP tuning: recv_poll? sdp_zcopy_thresh? other parameters?

Amir Vadai amirv at mellanox.co.il
Sun Jul 19 00:25:36 PDT 2009


Lars Hi,

There are many changes between SDP in ofed 1.5 and 1.4, So I need to know which version do you intend to use.

Some important parameters in ofed 1.4 that effect the latency, are:
1. Nagle - If you use Nagle, first packet is sent with no delay but next packets within a period of time, are collected before being sent - this is good for BW oriented traffic but not for latency. You could set socket option TCP_NODELAY on "meta" socket. and this should disable nagle there.
2. BCopy/BZcopy - Using bzcopy is good for CPU utilization on the send side (It means sending the buffer directly from the user buffer using the ib SEND verb) but only effective for big packets. Pay attention that when using bzcopy send, send() is blocked till buffer is sent on the wire and when using regular bcopy send, the send() command is returned almost immediately (after copying buffer into driver's private buffer).bzcopy threshold is controlled by setting sdp_zcopy_thresh. 
3. recv_poll - if I am not mistaken (it was added to SDP before my time), it is a poll that is done before recvmsg is going to sleep waiting for data.
4. send_poll - This is used mainly for ping pong traffic. It tells how many polls to do after a send, and if set right - will reduce number of interrupts on RX.

Other stuff that influence the BW is send buffer and receive buffer. I've seen places where it changed the BW in an order of magnitude. But here it is hard to tell the exact rule since it is a bit chaotic.

This was relevant mainly for ofed 1.4 - in 1.5 logic was changed in some places and more tuneables where added.
In 1.5 there is /proc/net/sdpstats that can give you important information about the patterns of traffic and SDP handling.

As you can see there are many tuneables and many of them are changed those days.

Bye,
Amir.

--

Amir Vadai
Software Eng.
Mellanox Technologies
mailto: amirv at mellanox.co.il
Tel +972-3-6259539

On 07/17/2009 06:25 PM, Lars Ellenberg wrote:
> 
> Lets say my typical usage pattern is:
> 
> two nodes, alice and bob.
> two "tcp" connections via SDP, long lifetime,
> called the "data" and the "meta" socket.
> 
> alice sends on the "data" socket, typically transfer
> message sizes of 512 byte to 32 KiB, say.
> 
> for each message received on the data socket bob then sends an "ack"
> back on the "meta" socket, message sizes of ~32 byte. [*]
> there are a few other messages on both sockets.
> 
> 
> I'd like to have maximum throughput when streaming large messages, but
> (of course) at the same time I'd like to have minimum latency for the
> short messages, and when sending only single requests.  obviously if the
> cpu overhead can be minimized, that won't hurt either ;)
> 
> 
> now, which tunables in /sys/modules/ib_sdp/parameters are those
> that will most likely have some effect here?
> 
> an "I don't now what I am tuning here, but I try anyways" approach
> gave me some benefit from using
> recv_poll	 200
> sdp_zcopy_thresh 8192,
> all else left at what the module chose itself.
> 
> 
> pointers to an overview about what those tunables actually do, or any
> recommendations (also for tunables in other modules, potentially, or
> sysctls, tcp or other socket options or whatnot) gladly accepted.
> 
> 
> [*]
> this is, of course, in fact DRBD (see http://www.drbd.org), the large
> messages on the "data" socket are replicated block device writes,
> bob needs to submit that data to its local IO subsystem,
> and only send out the "ack" messages on the "meta" socket
> once the corresponding write has been signalled as completed.
> 
> 



More information about the general mailing list