[ofa-general] SubnAdmGet (6777)

Bob Ciotti Bob.Ciotti at nasa.gov
Wed Jun 3 10:11:20 PDT 2009


On Wed, Jun 03, 2009 at 06:03:50AM -0500, Eli Dorfman (Voltaire) wrote:
> Eli Dorfman (Voltaire) wrote:
> > Hal Rosenstock wrote:
> >> On Mon, Jun 1, 2009 at 5:36 PM, Hal Rosenstock <hal.rosenstock at gmail.com> wrote:
> >>> On Mon, Jun 1, 2009 at 4:27 PM, Sean Hefty <sean.hefty at intel.com> wrote:
> >>>>> Yes, RMPP is an overhead when the response is a single MAD but is this
> >>>>> significant ? Anyhow, how can the spec be changed in a way that
> >>>>> doesn't break existing implementations ?
> >>>> But the implementations are assuming different things about SubnAdmGet.  The SA
> >>>> is assuming that the query should fail if multiple records match.  The client
> >>>> side software (ipoib and rdma_cm) assume that it will obtain a single record
> >>>> even if multiple paths are present.  So, something needs to change.
> >>> Seems so.
> >>>
> >>>> The spec indicates that value in the request is ignored and NumbPath is 1, not
> >>>> that NumbPath is completely ignored.
> >>> For Get, it doesn't say that the matches are paired down to this
> >>> number as it does for GetTable.
> >>>
> >>>>  Also see page 1242 in the SDP annex which
> >>>> reads: 'NumbPath could be 1 (in which case the SA query may use SubnAdmGet
> >>>> rather than SubnAdmGetTable)'.
> >>> SDP annex is not the primary source for this (chapter 15 is) and is
> >>> inconsistent and no one caught this.
> >>>
> >>>> To me, this implies that SubnAdmGet should be
> >>>> treated equivalent as SubnAdmGetTable with NumbPath = 1.
> >>>> It just seems really odd to treat NumbPath differently for PR SubnAdmGet versus
> >>>> PR SubnAdmGetTable and MPR SubAdmGetMulti.  Basically, this makes PR SubnAdmGet
> >>>> useless.
> >>> when there's a subnet with multiple paths and the requests are not
> >>> specific enough to use get.
> >>>
> >>> Seems like either the queries need to use RMPP, or the spec modified
> >>> (if that's possible) and the SAs updated.
> >> I sit corrected :-) Your interpretation of the spec is correct. Also,
> >> in looking at OpenSM, the intent is as you indicate: it does try to
> >> only return 1 attibute for get PR. If when returning the response,
> >> there is more than 1 attribute in the list, it returns the too many
> >> records error. There must be some code path I don't see right now
> >> which is doing this. It would be useful to know the details of the
> >> query (get request) causing this.
> >>
> > 
> > This may happen when pr_rcv_get_port_pair_paths() is called several times.
> > The only case i see is pr_rcv_process_world() that means the request is without or wrong 
> > src and dest port or component mask for SGID and DGID is 0.
> 
> correction - this may happen only when component mask for SGID and DGID is 0.

Here is a mad dump of the offending sequence.

Jun 02 12:43:01 355975 [3DD13940] 0x80 -> SUBNET UP
Jun 02 12:43:03 484480 [5020B940] 0x20 -> SA MAD dump:
                                base_ver................0x1
                                mgmt_class..............0x3
                                class_ver...............0x2
                                method..................0x1 (SubnAdmGet)
                                status..................0x0
                                resv....................0x0
                                trans_id................0x2b82ad0a0000
                                attr_id.................0x11 (NodeRecord)
                                resv1...................0x0
                                attr_mod................0x0
                                rmpp_version............0x0
                                rmpp_type...............0x0
                                rmpp_flags..............0x0
                                rmpp_status.............0x0
                                seg_num.................0x0
                                payload_len/new_win.....0x0
                                sm_key..................0x0000000000000000
                                attr_offset.............0x0
                                resv2...................0x0
                                comp_mask...............0x0000000000000001


Jun 02 12:43:03 490828 [19323940] 0x01 -> osm_sa_respond: ERR 4C05: Got more than one record for SubnAdmGet (6733)
Jun 02 12:43:03 490891 [19323940] 0x20 -> SA MAD dump:
                                base_ver................0x1
                                mgmt_class..............0x3
                                class_ver...............0x2
                                method..................0x81 (SubnAdmGetResp)
                                status..................0x400
                                resv....................0x0
                                trans_id................0x2b82ad0a0000
                                attr_id.................0x11 (NodeRecord)
                                resv1...................0x0
                                attr_mod................0x0
                                rmpp_version............0x0
                                rmpp_type...............0x0
                                rmpp_flags..............0x0
                                rmpp_status.............0x0
                                seg_num.................0x0
                                payload_len/new_win.....0x0
                                sm_key..................0x0000000000000000
                                attr_offset.............0x0
                                resv2...................0x0
                                comp_mask...............0x0000000000000001


bob



More information about the general mailing list