[ofa-general] QoS RFC
Or Gerlitz
ogerlitz at voltaire.com
Sun Aug 5 01:47:10 PDT 2007
Sean Hefty wrote:
> FYI - It is my intention to implement the host side portion of QoS
> support. (It's one of my path forward objectives.) I plan on
> implementing the host side as outlined below. If anyone has any
> comments, I would like to get them as soon as possible.
Sean,
From what I understand while reading your proposal, is that it is quite
different then what what suggested in the original RFC. I don't think it
makes sense to implement the host side of this before there's agreement
on the over-all solution namely how the host side design/code plugs to
the management scheme at the SM side.
Basically, the SM people have not really reacted on your proposal, which
is a problem...
One more thing that bothers me is backward compatibility with SM/SA,
that does not support the not-published-yet IBTA QoS extensions. Where
you thinking to first probe for the SA capabilities to see if it
supports QoS path-queries or think its an over-doing?
Or.
> Sean Hefty wrote:
>>> 2. Architecture ----------------
>>
>> This is a higher level approach to the problem, but I came up with the
>> following QoS relationship hierarchy, where '->' means 'maps to'.
>>
>> Application Service -> Service ID (or range)
>> Service ID -> desired QoS
>> QoS, SGID, DGID, PKey -> SGID, DGID, TClass, FlowLabel, PKey
>> SGID, DGID, TC, FL, PKey -> SLID, DLID, SL (set if crossing subnets)
>> SLID, DLID, SL -> MTU, Rate, VL, PacketLifeTime
>>
>> I use these relationships below:
>>
>>> 4. IPoIB ---------
>>>
>>> IPoIB already query the SA for its broadcast group information. The
>>> additional functionality required is for IPoIB to provide the
>>> broadcast group SL, MTU, and RATE in every following PathRecord query
>>> performed when a new UDAV is needed by IPoIB. We could assign a
>>> special Service-ID for IPoIB use but since all communication on the
>>> same IPoIB interface shares the same QoS-Level without the ability to
>>> differentiate it by target service we can ignore it for simplicity.
>>
>> Rather than IPoIB specifying SL, MTU, and rate with PR queries, it
>> should specify TClass and FlowLabel. This is necessary for IPoIB to
>> span IB subnets.
>>
>>> 5. CMA features ----------------
>>>
>>> The CMA interface supports Service-ID through the notion of port
>>> space as a prefixes to the port_num which is part of the sockaddr
>>> provided to rdma_resolve_add(). What is missing is the explicit
>>> request for a QoS-Class that should allow the ULP (like SDP) to
>>> propagate a specific request for a class of service. A mechanism for
>>> providing the QoS-Class is available in the IPv6 address, so we could
>>> use that address field. Another option is to implement a special
>>> connection options API for CMA.
>>>
>>> Missing functionality by CMA is the usage of the provided QoS-Class
>>> and Service-ID in the sent PR/MPR. When a response is obtained it is
>>> an existing requirement for the CMA to use the PR/MPR from the
>>> response in setting up the QP address vector.
>>
>> I think the RDMA CM needs two solutions, depending on which address
>> family is used. For IPv6, the existing interface is sufficient, and
>> works for both IB and iWarp. The RDMA CM only needs to include the TC
>> and FL as part of its PR query. For IPv4, to remain transport
>> neutral, I think we should add an rdma_set_option() routine to specify
>> the QoS field. The RDMA CM would include the QoS field for PR query
>> under this condition.
>>
>> For IB, this requires changes to the ib_sa to support the new PR
>> extensions. I don't think we gain anything having the RDMA CM include
>> service IDs as part of the query.
>>
>>> 6. SDP -------
>>>
>>> SDP uses CMA for building its connections. The Service-ID for SDP is
>>> 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote
>>> TCP/IP Port Number to connect to. SDP might be provided with
>>> SO_PRIORITY socket option. In that case the value provided should be
>>> sent to the CMA as the TClass option of that connection.
>>
>> SDP would use specify the QoS through the IPv6 address or
>> rdma_set_option() routine.
>>
>>> 7. SRP -------
>>>
>>> Current SRP implementation uses its own CM callbacks (not CMA). So
>>> SRP should fill in the Service-ID in the PR/MPR by itself and use
>>> that information in setting up the QP. The T10 SRP standard defines
>>> the SRP Service-ID to be defined by the SRP target I/O Controller
>>> (but they should also comply with IBTA Service- ID rules). Anyway,
>>> the Service-ID is reported by the I/O Controller in the
>>> ServiceEntries DMA attribute and should be used in the PR/MPR if the
>>> SA reports its ability to handle QoS PR/MPRs.
>>
>> I agree.
>>
>>> 8. iSER -------- iSER uses CMA and thus should be very close to SDP.
>>> The Service-ID for iSER should be TBD.
>>
>> See RDMA CM and SDP.
>>
>>> 3.2. PR/MPR query handling: OpenSM should be able to enforce the
>>> provided policy on client request. The overall flow for such requests
>>> is: first the request is matched against the defined match rules such
>>> that the target QoS-Level definition is found. Given the QoS-Level a
>>> path(s) search is performed with the given restrictions imposed by
>>> that level. The following two sections describe these steps.
>>
>> If we use the QoS hierarchy outlined above, I think we can construct
>> some fairly simple tables to guide our PR selection. The SA may need
>> to construct the tables starting at the bottom and working up, but I
>> *think* it could be done. And by distributing the tables, we can
>> support a more distributed (a la local SA) operation.
>>
>> From an administration point, I would be happier seeing something
>> where the administrator defines a QoS level in terms of latency or
>> bandwidth requirements and relative priority. Then, if desired, the
>> administrator could provide more details, such as indicating which
>> nodes would use which services, minimum required MTUs, etc. It would
>> then be up to the SA to map these requirements to specific TC, FL, SL,
>> VL values.
>>
>> In general, though, I'm personally far less concerned with the QoS
>> specification interface to the SA, versus the operation that takes
>> place on the hosts.
>>
>> Comments on using this approach on the host side?
More information about the general
mailing list