[openib-general] scaling issues, was: uDAPL cma: add support for address and route retries, call disconnect when recving dreq

Sean Hefty mshefty at ichips.intel.com
Thu Nov 2 11:13:43 PST 2006


> We had an option to increase the RQ size for QP1 and QP0.
> This might help you too: try increasing IB_MAD_QP_RECV_SIZE.

Actually, dropping the requests actually helps the scalability.

If nothing gets dropped, the backlog of queued requests grows to hundreds of 
thousands, most of which will have timed out before the SA can get around to 
processing them.

One option is having the SA (or ib_umad?) return a busy status in response to a 
MAD, but we'd still have to be able to send this response as quickly as requests 
are being received.  We could then limit the number of requests that would be 
queued in the kernel for a user.

Unfortunately, when we are able to run on the cluster, modifying the kernel 
modules isn't available to use...

- Sean




More information about the general mailing list