[libfabric-users] interesting first results of my benchmark

Hefty, Sean sean.hefty at intel.com
Thu May 23 13:21:23 PDT 2019


> They way I interpret these results is that there must be some form of underlying
> synchronization going on that I am not aware of and I have no idea whether this is the
> doing of libfabric or the InfiniBand protocol however since ibcongest without flow
> control appears to behave similarly(however not linear) to the benchmark I tend to
> assume that this is the doing of the InfiniBand protocol. So some questions arise: what
> is causing this synch? can I turn this synch off? Can I do it through libfabric? Is it
> possible to implement manual routing within libfabric? If anyone could share some
> insight on this issue I would be very grateful.

IB does not have any underlying sync.  You can't access the IB subnet setup through libfabric.  You would need to use IB specific management tools to make adjustments there.  IB routing would need to be configured through the SM.

Personally, I would start by examining the behavior of the app.  Maybe re-run the experiment with only 2 clients and 2 servers, or whatever the minimum can be, and see if you can reproduce the same performance dropoff.  There may be an unexpected interaction that results in the performance of everyone degrading to that of the slowest connection.  Specifically, I would look at the impact from this comment:

"all clients queue a defined number of fi_write() per endpoint and once a completion arrives the respective endpoint is looked up and a new fi_write is enqueued for that endpoint"

- Sean


More information about the Libfabric-users mailing list