[openib-general] [PATCH 0/4] SA path record caching

Sean Hefty sean.hefty at intel.com
Wed Feb 1 09:35:01 PST 2006


>Aren't we creating a monster here??? if this is SA replica which should
>work for scale from day one, lets call it this way and see how to reach
>there.

The cache update window is configurable.  What we don't know is how
often the SA would be queried to establish connections without a local
cache present.  Based on information from SilverStorm, the cache should
work well in practice.

What I think we'd like is a userspace cache hierarchy/distributed SA;
however, the time to develop these does not meet any of the Path Forward
schedules.

Having a mechanism where clients could ask if there have been any
updates would also work, but I didn't see a way to do this without
modifying the SA node.


>+ neither MVAPICH nor OpenMPI are using path query

The national labs want all path records for their routing algorithms.  I
believe that the problems here were API issues that make connecting
difficult.  As a result, most applications just hard-coded everything.


>+ OpenMPI is opening its connections "per demand" that is only if rank
>I
>attempts to send a message to rank J then I connects to J
>
>+ even MPIs that connect all-to-all in an N ranks JOB would do only
>n(n-1)/2 path queries, so the load aggregated load on the SA is half
>what the all-to-all caching scheme is generating

It would be better to issue a single query for all path records, and
discard those not needed, than issue separate path records queries.
This is what the cache does.  The difference is 1000 queries, versus
500,000 queries.  The total number of MADs generated by the SA is still
lower using a single query to return all path records.

- Sean




More information about the general mailing list