[ofw][patch][ND provider] Improving latency of ms-mpi

Thu Aug 6 14:27:13 PDT 2009

>Max inline data is not an in/out parameter to the INDConnector::CreateEndpoint
>method.  I don't know if it makes sense to have it be an input parameter.
>Aren't the proper tuning points dependent on the HCA, rather than the app?

This should be an app controlled parameter.  Using the max inline is not
guaranteed to be faster unless all of your sends happen to be that exact size.

Can someone provide a list of the drawbacks of using a larger max inline size?
I believe it increases the amount of memory required by the QP, and increases
the size of the transfers to the HCA across the PCI bus.  Latency benchmarks may
not care, but real application performance should be affected.

>Assuming that the tuning points are specific to the HCA models, does it make
>sense to always allocate 400 bytes?  Is it always faster to inline 400 bytes
>than to DMA the data for all HCAs (InfiniHost 3 LX, EX, ConnectX, etc?)  It
>seems to me that having the inline data controlled by the HCA driver rather
>than the ND provider would make more sense, and allow the HCA driver to
>optimize the sweet spot.

At the very least this should be tunable, but you really want this per
application, not as a system wide setting.

- Sean