[ofa-general] QoS RFC

Sun Aug 5 01:47:10 PDT 2007

Sean Hefty wrote:
> FYI - It is my intention to implement the host side portion of QoS 
> support.  (It's one of my path forward objectives.)  I plan on 
> implementing the host side as outlined below.  If anyone has any 
> comments, I would like to get them as soon as possible.

Sean,

 From what I understand while reading your proposal, is that it is quite 
different then what what suggested in the original RFC. I don't think it 
makes sense to implement the host side of this before there's agreement 
on the over-all solution namely how the host side design/code plugs to 
the management scheme at the SM side.

Basically, the SM people have not really reacted on your proposal, which 
is a problem...

One more thing that bothers me is backward compatibility with SM/SA, 
that does not support the not-published-yet IBTA QoS extensions. Where 
you thinking to first probe for the SA capabilities to see if it 
supports QoS path-queries or think its an over-doing?

Or.
> Sean Hefty wrote:
>>> 2. Architecture ----------------
>>
>> This is a higher level approach to the problem, but I came up with the
>> following QoS relationship hierarchy, where '->' means 'maps to'.
>>
>> Application Service -> Service ID (or range)
>> Service ID -> desired QoS
>> QoS, SGID, DGID, PKey -> SGID, DGID, TClass, FlowLabel, PKey
>> SGID, DGID, TC, FL, PKey -> SLID, DLID, SL (set if crossing subnets)
>> SLID, DLID, SL -> MTU, Rate, VL, PacketLifeTime
>>
>> I use these relationships below:
>>
>>> 4. IPoIB ---------
>>>
>>> IPoIB already query the SA for its broadcast group information. The 
>>> additional functionality required is for IPoIB to provide the
>>> broadcast group SL, MTU, and RATE in every following PathRecord query
>>> performed when a new UDAV is needed by IPoIB. We could assign a
>>> special Service-ID for IPoIB use but since all communication on the
>>> same IPoIB interface shares the same QoS-Level without the ability to
>>>  differentiate it by target service we can ignore it for simplicity.
>>
>> Rather than IPoIB specifying SL, MTU, and rate with PR queries, it 
>> should specify TClass and FlowLabel.  This is necessary for IPoIB to 
>> span IB subnets.
>>
>>> 5. CMA features ----------------
>>>
>>> The CMA interface supports Service-ID through the notion of port
>>> space as a prefixes to the port_num which is part of the sockaddr
>>> provided to rdma_resolve_add(). What is missing is the explicit
>>> request for a QoS-Class that should allow the ULP (like SDP) to
>>> propagate a specific request for a class of service. A mechanism for
>>> providing the QoS-Class is available in the IPv6 address, so we could
>>> use that address field. Another option is to implement a special 
>>> connection options API for CMA.
>>>
>>> Missing functionality by CMA is the usage of the provided QoS-Class
>>> and Service-ID in the sent PR/MPR. When a response is obtained it is
>>> an existing requirement for the CMA to use the PR/MPR from the
>>> response in setting up the QP address vector.
>>
>> I think the RDMA CM needs two solutions, depending on which address 
>> family is used.  For IPv6, the existing interface is sufficient, and 
>> works for both IB and iWarp.  The RDMA CM only needs to include the TC 
>> and FL as part of its PR query.  For IPv4, to remain transport 
>> neutral, I think we should add an rdma_set_option() routine to specify 
>> the QoS field.  The RDMA CM would include the QoS field for PR query 
>> under this condition.
>>
>> For IB, this requires changes to the ib_sa to support the new PR 
>> extensions.  I don't think we gain anything having the RDMA CM include 
>> service IDs as part of the query.
>>
>>> 6. SDP -------
>>>
>>> SDP uses CMA for building its connections. The Service-ID for SDP is
>>> 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote
>>> TCP/IP Port Number to connect to. SDP might be provided with
>>> SO_PRIORITY socket option. In that case the value provided should be
>>> sent to the CMA as the TClass option of that connection.
>>
>> SDP would use specify the QoS through the IPv6 address or 
>> rdma_set_option() routine.
>>
>>> 7. SRP -------
>>>
>>> Current SRP implementation uses its own CM callbacks (not CMA). So
>>> SRP should fill in the Service-ID in the PR/MPR by itself and use
>>> that information in setting up the QP. The T10 SRP standard defines
>>> the SRP Service-ID to be defined by the SRP target I/O Controller
>>> (but they should also comply with IBTA Service- ID rules). Anyway,
>>> the Service-ID is reported by the I/O Controller in the 
>>> ServiceEntries DMA attribute and should be used in the PR/MPR if the
>>> SA reports its ability to handle QoS PR/MPRs.
>>
>> I agree.
>>
>>> 8. iSER -------- iSER uses CMA and thus should be very close to SDP.
>>> The Service-ID for iSER should be TBD.
>>
>> See RDMA CM and SDP.
>>
>>> 3.2. PR/MPR query handling: OpenSM should be able to enforce the
>>> provided policy on client request. The overall flow for such requests
>>> is: first the request is matched against the defined match rules such
>>> that the target QoS-Level definition is found. Given the QoS-Level a
>>> path(s) search is performed with the given restrictions imposed by
>>> that level. The following two sections describe these steps.
>>
>> If we use the QoS hierarchy outlined above, I think we can construct 
>> some fairly simple tables to guide our PR selection.  The SA may need 
>> to construct the tables starting at the bottom and working up, but I 
>> *think* it could be done.  And by distributing the tables, we can 
>> support a more distributed (a la local SA) operation.
>>
>>  From an administration point, I would be happier seeing something 
>> where the administrator defines a QoS level in terms of latency or 
>> bandwidth requirements and relative priority.  Then, if desired, the 
>> administrator could provide more details, such as indicating which 
>> nodes would use which services, minimum required MTUs, etc.  It would 
>> then be up to the SA to map these requirements to specific TC, FL, SL, 
>> VL values.
>>
>> In general, though, I'm personally far less concerned with the QoS 
>> specification interface to the SA, versus the operation that takes 
>> place on the hosts.
>>
>> Comments on using this approach on the host side?