[ofa-general] Re: Scalable reliable connection

Gleb Natapov glebn at voltaire.com
Tue Jul 31 04:56:05 PDT 2007


On Mon, Jul 30, 2007 at 03:50:54PM +0300, Michael S. Tsirkin wrote:
> With SRC:
> 		O(N ^ 2 * J)
> 
> 	This is achived by using a single send queue (per job, out of O(N * J) jobs)
> 	to send data to all J jobs running on a specific node (out of O(N) nodes).
> 	Hardware uses new "SRQ number" field in packet header to
> 	multiplex receive WRs and WCs to private memory of each job.
> 
But since the send queue cannot be used for receiving packets additional
receive QPs have to be created one per job so with SRC it is actually
    O(N ^ 2 * J + N * J)
unless I am missing something.

> This is similiar idea to IB RD.
Except that with RD there is no need to jump through hoops and create
separate QP for sending and receiving packets in order to achieve
scalability.

> Q: Why not use RD then?
> A: Because no hardware supports it.
Wrong answer :) There was no HW for SRC too, but Mellanox decided to
implement SRC instead of RD. The reasons Dror provided for this
a) RD is hard to do
 Not really very sounding reason IMO. Not doing RD is just pushing
 the complexity from HW to SW. And there are HW implementation of RD,
 not for IB though.
b) RD, as defined by IB spec, will not achieve good performance
 This reason is serious, but can Spec be changed to allow for high
 performance implementation? Spec compliance not something that stopped
 Mellanox from doing things before :)

Thanks for protocol explanation.

--
			Gleb.



More information about the general mailing list