[ewg] Scalable reliable connection

Michael S. Tsirkin mst at dev.mellanox.co.il
Mon Jul 30 05:50:54 PDT 2007


Here's some background on what SRC is.  This is basically slide 6 in Dror's
talk, for those that missed the talk.

 * * *

SRC is an extension supported by recent Mellanox hardware
which is geared toward reducing the number of QPs
required for all-to-all communication on systems
with a high number of jobs per node.

===================================================================
Motivation:
===================================================================
Given N nodes with J jobs per node, number of QPs required
for all-to-all communication is:

With RC:
		O((N * J) ^ 2)

	Since each job out of O(N * J) jobs must create a single QP
	to communicate with each one of O(N * J) other jobs.

With SRC:
		O(N ^ 2 * J)

	This is achived by using a single send queue (per job, out of O(N * J) jobs)
	to send data to all J jobs running on a specific node (out of O(N) nodes).
	Hardware uses new "SRQ number" field in packet header to
	multiplex receive WRs and WCs to private memory of each job.

This is similiar idea to IB RD.
Q: Why not use RD then?
A: Because no hardware supports it.

Details:

===================================================================
Verbs extension:
===================================================================

- There is a new transport/QP type "SRC".
- There is a new object type "SRC domain"
- Each SRQ gets new (optional) attributes:
        SRC domain
	SRC SRQ number
        SRC CQ
  SRQ must have either all 3 of these or none of these attributes

- QPs of type SRC have all the same attributes as regular RC QPs
  connected to SRQ, except that:
  A. Each SRC QP has a new required attribute "SRC domain"
  B. SRC QPs do *not* have "SRQ" attribute
  	(do not have a specific SRQ associated with them)

===================================================================
Protocol extension:
===================================================================
SRC QP behaviour: Requestor
- Post send WR for this QP type is extended with SRQ number field
  This number is sent as part of packet header
- SRC Packets follow rules for RC packets on the wire, exactly
  What is different is their handling at the responder side

SRC QP behaviour: Responder
Each incoming packet passes transport checks with respect
to the SRC QP, following RC rules, exactly.

After this, SRQ number in packet header is used to look up
a specific SRQ. SRC domain of the resulting SRQ must be equal
to SRC domain of the QP, otherwise a NAK is sent,
and QP moves to error state.

If the SRC domains match, receive WR and receive WC processing
are as follows:

- RC Send
  - Rather than using SRQ to which the QP is attached,
    SRQ is looked up by SRQ number in the packet.
    Receive WR is taken from this SRQ.
  - Completions are generated on the CQ specified in the SRQ

- RDMA/Atomic
  - Rather than using PD to which the QP is attached,
    SRQ is looked up by SRQ number in the packet.
    PD of this SRQ is used for protection checks.
===================================================================
 
-- 
MST



More information about the ewg mailing list