[ofa-general] Scalable reliable connection

Tang, Changqing changquing.tang at hp.com
Tue Jul 31 09:12:09 PDT 2007


A send queue can only serve max J jobs within a node. Is it possible to
make a single send queue to serve all jobs on all nodes ?

--CQ 

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Michael S. Tsirkin
> Sent: Monday, July 30, 2007 7:51 AM
> To: Gleb Natapov
> Cc: Pavel Shamis; ewg at lists.openfabrics.org; Michael S. 
> Tsirkin; general at lists.openfabrics.org; Ishai Rabinovitz
> Subject: [ofa-general] Scalable reliable connection
> 
> 
> Here's some background on what SRC is.  This is basically 
> slide 6 in Dror's talk, for those that missed the talk.
> 
>  * * *
> 
> SRC is an extension supported by recent Mellanox hardware 
> which is geared toward reducing the number of QPs required 
> for all-to-all communication on systems with a high number of 
> jobs per node.
> 
> ===================================================================
> Motivation:
> ===================================================================
> Given N nodes with J jobs per node, number of QPs required 
> for all-to-all communication is:
> 
> With RC:
> 		O((N * J) ^ 2)
> 
> 	Since each job out of O(N * J) jobs must create a single QP
> 	to communicate with each one of O(N * J) other jobs.
> 
> With SRC:
> 		O(N ^ 2 * J)
> 
> 	This is achived by using a single send queue (per job, 
> out of O(N * J) jobs)
> 	to send data to all J jobs running on a specific node 
> (out of O(N) nodes).
> 	Hardware uses new "SRQ number" field in packet header to
> 	multiplex receive WRs and WCs to private memory of each job.
> 
> This is similiar idea to IB RD.
> Q: Why not use RD then?
> A: Because no hardware supports it.
> 
> Details:
> 
> ===================================================================
> Verbs extension:
> ===================================================================
> 
> - There is a new transport/QP type "SRC".
> - There is a new object type "SRC domain"
> - Each SRQ gets new (optional) attributes:
>         SRC domain
> 	SRC SRQ number
>         SRC CQ
>   SRQ must have either all 3 of these or none of these attributes
> 
> - QPs of type SRC have all the same attributes as regular RC QPs
>   connected to SRQ, except that:
>   A. Each SRC QP has a new required attribute "SRC domain"
>   B. SRC QPs do *not* have "SRQ" attribute
>   	(do not have a specific SRQ associated with them)
> 
> ===================================================================
> Protocol extension:
> ===================================================================
> SRC QP behaviour: Requestor
> - Post send WR for this QP type is extended with SRQ number field
>   This number is sent as part of packet header
> - SRC Packets follow rules for RC packets on the wire, exactly
>   What is different is their handling at the responder side
> 
> SRC QP behaviour: Responder
> Each incoming packet passes transport checks with respect to 
> the SRC QP, following RC rules, exactly.
> 
> After this, SRQ number in packet header is used to look up a 
> specific SRQ. SRC domain of the resulting SRQ must be equal 
> to SRC domain of the QP, otherwise a NAK is sent, and QP 
> moves to error state.
> 
> If the SRC domains match, receive WR and receive WC 
> processing are as follows:
> 
> - RC Send
>   - Rather than using SRQ to which the QP is attached,
>     SRQ is looked up by SRQ number in the packet.
>     Receive WR is taken from this SRQ.
>   - Completions are generated on the CQ specified in the SRQ
> 
> - RDMA/Atomic
>   - Rather than using PD to which the QP is attached,
>     SRQ is looked up by SRQ number in the packet.
>     PD of this SRQ is used for protection checks.
> ===================================================================
>  
> --
> MST
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 



More information about the general mailing list