[ofa-general] [PATCH] perftest Add rdma_cm retries

David McMillen davem at systemfabricworks.com
Sun Jul 26 03:32:37 PDT 2009


I am pleased to see the discussions this has raised about possible changes
to the underlying kernel services.   However, this user-mode patch does
address a real problem in the stack as it stands, and I'd like to see
agreement that it should be applied.  The last (V3) version of the patch
makes the default action behave fairly well, and adds command line switches
that allow complete control over the timeouts/retries.

I don't know enough about the kernel level code to make comments on the best
way to improve it, but I would like to make an observation from the user
level.  There is no way for the user level to have an informed decision
about the proper value for timeouts or retry counts.  If you are writing
some code (MPI comes to mind) that you know will impose a difficult load and
therefore requires special tuning, it actually impacts all other user level
code running on the fabric.  Nothing allows that special tuning to be known
or shared.  I think those kinds of changes really need to be taken over by
system-wide parameters in the kernel, so much so that perhaps some
consideration should be given to ignoring the timeout parameters on
rdma_resolve_addr and rdma_resolve_route in favor of kernel parameters.

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090726/2f1f5708/attachment.html>


More information about the general mailing list