[ewg] RE: [ofa-general] Scalable reliable connection
Tang, Changqing
changquing.tang at hp.com
Tue Jul 31 09:12:09 PDT 2007
A send queue can only serve max J jobs within a node. Is it possible to
make a single send queue to serve all jobs on all nodes ?
--CQ
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> Michael S. Tsirkin
> Sent: Monday, July 30, 2007 7:51 AM
> To: Gleb Natapov
> Cc: Pavel Shamis; ewg at lists.openfabrics.org; Michael S.
> Tsirkin; general at lists.openfabrics.org; Ishai Rabinovitz
> Subject: [ofa-general] Scalable reliable connection
>
>
> Here's some background on what SRC is. This is basically
> slide 6 in Dror's talk, for those that missed the talk.
>
> * * *
>
> SRC is an extension supported by recent Mellanox hardware
> which is geared toward reducing the number of QPs required
> for all-to-all communication on systems with a high number of
> jobs per node.
>
> ===================================================================
> Motivation:
> ===================================================================
> Given N nodes with J jobs per node, number of QPs required
> for all-to-all communication is:
>
> With RC:
> O((N * J) ^ 2)
>
> Since each job out of O(N * J) jobs must create a single QP
> to communicate with each one of O(N * J) other jobs.
>
> With SRC:
> O(N ^ 2 * J)
>
> This is achived by using a single send queue (per job,
> out of O(N * J) jobs)
> to send data to all J jobs running on a specific node
> (out of O(N) nodes).
> Hardware uses new "SRQ number" field in packet header to
> multiplex receive WRs and WCs to private memory of each job.
>
> This is similiar idea to IB RD.
> Q: Why not use RD then?
> A: Because no hardware supports it.
>
> Details:
>
> ===================================================================
> Verbs extension:
> ===================================================================
>
> - There is a new transport/QP type "SRC".
> - There is a new object type "SRC domain"
> - Each SRQ gets new (optional) attributes:
> SRC domain
> SRC SRQ number
> SRC CQ
> SRQ must have either all 3 of these or none of these attributes
>
> - QPs of type SRC have all the same attributes as regular RC QPs
> connected to SRQ, except that:
> A. Each SRC QP has a new required attribute "SRC domain"
> B. SRC QPs do *not* have "SRQ" attribute
> (do not have a specific SRQ associated with them)
>
> ===================================================================
> Protocol extension:
> ===================================================================
> SRC QP behaviour: Requestor
> - Post send WR for this QP type is extended with SRQ number field
> This number is sent as part of packet header
> - SRC Packets follow rules for RC packets on the wire, exactly
> What is different is their handling at the responder side
>
> SRC QP behaviour: Responder
> Each incoming packet passes transport checks with respect to
> the SRC QP, following RC rules, exactly.
>
> After this, SRQ number in packet header is used to look up a
> specific SRQ. SRC domain of the resulting SRQ must be equal
> to SRC domain of the QP, otherwise a NAK is sent, and QP
> moves to error state.
>
> If the SRC domains match, receive WR and receive WC
> processing are as follows:
>
> - RC Send
> - Rather than using SRQ to which the QP is attached,
> SRQ is looked up by SRQ number in the packet.
> Receive WR is taken from this SRQ.
> - Completions are generated on the CQ specified in the SRQ
>
> - RDMA/Atomic
> - Rather than using PD to which the QP is attached,
> SRQ is looked up by SRQ number in the packet.
> PD of this SRQ is used for protection checks.
> ===================================================================
>
> --
> MST
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the ewg
mailing list