[ofa-general] QoS RFC

Thu Jul 26 18:11:51 PDT 2007

> 2. Architecture ----------------

This is a higher level approach to the problem, but I came up with the
following QoS relationship hierarchy, where '->' means 'maps to'.

Application Service -> Service ID (or range)
Service ID -> desired QoS
QoS, SGID, DGID, PKey -> SGID, DGID, TClass, FlowLabel, PKey
SGID, DGID, TC, FL, PKey -> SLID, DLID, SL (set if crossing subnets)
SLID, DLID, SL -> MTU, Rate, VL, PacketLifeTime

I use these relationships below:

> 4. IPoIB ---------
> 
> IPoIB already query the SA for its broadcast group information. The 
> additional functionality required is for IPoIB to provide the
> broadcast group SL, MTU, and RATE in every following PathRecord query
> performed when a new UDAV is needed by IPoIB. We could assign a
> special Service-ID for IPoIB use but since all communication on the
> same IPoIB interface shares the same QoS-Level without the ability to
>  differentiate it by target service we can ignore it for simplicity.

Rather than IPoIB specifying SL, MTU, and rate with PR queries, it 
should specify TClass and FlowLabel.  This is necessary for IPoIB to 
span IB subnets.

> 5. CMA features ----------------
> 
> The CMA interface supports Service-ID through the notion of port
> space as a prefixes to the port_num which is part of the sockaddr
> provided to rdma_resolve_add(). What is missing is the explicit
> request for a QoS-Class that should allow the ULP (like SDP) to
> propagate a specific request for a class of service. A mechanism for
> providing the QoS-Class is available in the IPv6 address, so we could
> use that address field. Another option is to implement a special 
> connection options API for CMA.
> 
> Missing functionality by CMA is the usage of the provided QoS-Class
> and Service-ID in the sent PR/MPR. When a response is obtained it is
> an existing requirement for the CMA to use the PR/MPR from the
> response in setting up the QP address vector.

I think the RDMA CM needs two solutions, depending on which address 
family is used.  For IPv6, the existing interface is sufficient, and 
works for both IB and iWarp.  The RDMA CM only needs to include the TC 
and FL as part of its PR query.  For IPv4, to remain transport neutral, 
I think we should add an rdma_set_option() routine to specify the QoS 
field.  The RDMA CM would include the QoS field for PR query under this 
condition.

For IB, this requires changes to the ib_sa to support the new PR 
extensions.  I don't think we gain anything having the RDMA CM include 
service IDs as part of the query.

> 6. SDP -------
> 
> SDP uses CMA for building its connections. The Service-ID for SDP is
> 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote
> TCP/IP Port Number to connect to. SDP might be provided with
> SO_PRIORITY socket option. In that case the value provided should be
> sent to the CMA as the TClass option of that connection.

SDP would use specify the QoS through the IPv6 address or 
rdma_set_option() routine.

> 7. SRP -------
> 
> Current SRP implementation uses its own CM callbacks (not CMA). So
> SRP should fill in the Service-ID in the PR/MPR by itself and use
> that information in setting up the QP. The T10 SRP standard defines
> the SRP Service-ID to be defined by the SRP target I/O Controller
> (but they should also comply with IBTA Service- ID rules). Anyway,
> the Service-ID is reported by the I/O Controller in the 
> ServiceEntries DMA attribute and should be used in the PR/MPR if the
> SA reports its ability to handle QoS PR/MPRs.

I agree.

> 8. iSER -------- iSER uses CMA and thus should be very close to SDP.
> The Service-ID for iSER should be TBD.

See RDMA CM and SDP.

> 3.2. PR/MPR query handling: OpenSM should be able to enforce the
> provided policy on client request. The overall flow for such requests
> is: first the request is matched against the defined match rules such
> that the target QoS-Level definition is found. Given the QoS-Level a
> path(s) search is performed with the given restrictions imposed by
> that level. The following two sections describe these steps.

If we use the QoS hierarchy outlined above, I think we can construct 
some fairly simple tables to guide our PR selection.  The SA may need to 
construct the tables starting at the bottom and working up, but I 
*think* it could be done.  And by distributing the tables, we can 
support a more distributed (a la local SA) operation.

 From an administration point, I would be happier seeing something where 
the administrator defines a QoS level in terms of latency or bandwidth 
requirements and relative priority.  Then, if desired, the administrator 
could provide more details, such as indicating which nodes would use 
which services, minimum required MTUs, etc.  It would then be up to the 
SA to map these requirements to specific TC, FL, SL, VL values.

In general, though, I'm personally far less concerned with the QoS 
specification interface to the SA, versus the operation that takes place 
on the hosts.

Comments on using this approach on the host side?

- Sean