[openib-general] ib_local_sa testing and observations.
Moni Levy
monil at voltaire.com
Wed Mar 29 23:46:41 PST 2006
Hi Sean,
we've thought about possible ways of testing the implementation
of ib_local_sa and tried to estimate the load that it would cause to
the fabric. We did some math about the number of packets that the SM
should be able to handle in a test case of 1k node fabric and it looks
that this should be pretty heavy load on the SM side. The first,
"bring up" storm will be something like approximately 1000 paths / 3
paths per packet = 333 RMPP packets, lets say that the RMPP window is
20 , that means 17 more ACKs (RX) so approx 350 packets to handle per
node. In case we have 1000 nodes then the SM will have to handle 350k
packets in 1000 concurrent RMPP sessions. Now we get to implementation
details of the SMs. Do you know how many RMPP packets per second
(maximum) the OSM can handle? Please keep in mind that in case of RMPP
packets there is a lot of processing in the sender side like timers,
window management and ACK/NACK processing, also the whole list of
paths should be recreated for each session(CPU load on the SM
machine). That probably means we'll have a period at the beginning of
the fabric bring up during which the SM will just not be able to
process any queries. That's the exact period that all of the IPoIB
interfaces in the nodes would like to join to the relevant MC groups
and will probably not get processed in a reasonable time period
(timeout). I'm not even thinking about retransmissions of lost RMPP
packets , 2-3 partitions and lmc > 0. Did you do any tests or have any
ideas of possible simulations that can help to verify the above.
Regards,
Moni
More information about the general
mailing list