[openib-general] Re: OpenSM problem?!

Hal Rosenstock halr at voltaire.com
Fri Sep 16 02:59:13 PDT 2005


Hi Roel,

On Thu, 2005-09-15 at 20:21, Roel van der Goot wrote:
> Hi Hal,
> 
> If my interpretation of the spec is correct, it seems that OpenSM
> has problems with a window size of one during the RMPP protocol.
> Shortly after receiving a lid from the OpenSM, our application
> asks for subnAdmGetTable(SANodeRecord):
                           ^^^^^^^^^^^^
                   (nit)   NodeRecord

>      sending MAD:
> 
>      MAD:                struct SAHeader (56 bytes) -
>                          SA Header (section 15.2.1.1)
>      {
>          RMPPHeader:         struct RMPPHeader (36 bytes) -
>                              RMPP Header Fields (section 13.6.2.1)
>          {
>              MADHeader:          struct MADHeader (24 bytes) -
>                                  MAD Base Header (section 13.4.3)
>              {
>                  baseVersion:        0x01    (8 bit uint)
>                  mgmtClass:          0x03    (8 bit uint)
>                  classVersion:       0x02    (8 bit uint)
>                  method:             0x12    (8 bit uint)
>                  status:             0x0000    (16 bit uint)
>                  classSpecific:      0x0000    (16 bit uint)
>                  transactionID:      0x000000000003B59E    (64 bit uint)
>                  attributeID:        0x0011    (16 bit uint)
>                  rsv0:               0x0000    (16 bit uint)
>                  attributeModifier:  0x00000000    (32 bit uint)
>              }
>              RMPPVersion:        0x00    (8 bit uint)
>              RMPPType:           0x00    (8 bit uint)
>              RRespTime:          0x00    (5 bit uint)
>              RMPPFlags:          0x0    (3 bit uint)
>              RMPPStatus:         0x00    (8 bit uint)
>              data1:              0x4E4F4E4F    (32 bit uint)
>              data2:              0x4E4F4E4F    (32 bit uint)
>          }
>          SMKey:              0x0000000000000000    (64 bit uint)
>          attributeOffset:    0x000E    (16 bit uint)
>          rsv0:               0x0000    (16 bit uint)
>          componentMask:      0x0000000000000000    (64 bit uint)
>      }

Right, that is the SubnAdmGetTable(NodeRecord) request which asks for
all NodeRecords. AttributeOffset need not be set here but it causes no
harm.

> The OpenSM replies:
> 
>      received MAD:
>      struct HdrLRH (8 bytes) -
>                          Local Route Header (section 7.7)
>      {
>          VL:                 0x0    (4 bit uint)
>          LVer:               0x0    (4 bit uint)
>          SL:                 0x0    (4 bit uint)
>          rsv0:               0x0    (2 bit uint)
>          LNH:                0x2    (2 bit uint)
>          DLID:               0x0001    (16 bit uint)
>          rsv1:               0x00    (5 bit uint)
>          pktLen:             0x048    (11 bit uint)
>          SLID:               0x0002    (16 bit uint)
>      }
>      MAD:                struct SAHeader (56 bytes) -
>                          SA Header (section 15.2.1.1)
>      {
>          RMPPHeader:         struct RMPPHeader (36 bytes) -
>                              RMPP Header Fields (section 13.6.2.1)
>          {
>              MADHeader:          struct MADHeader (24 bytes) -
>                                  MAD Base Header (section 13.4.3)
>              {
>                  baseVersion:        0x01    (8 bit uint)
>                  mgmtClass:          0x03    (8 bit uint)
>                  classVersion:       0x02    (8 bit uint)
>                  method:             0x92    (8 bit uint)
>                  status:             0x0000    (16 bit uint)
>                  classSpecific:      0x0000    (16 bit uint)
>                  transactionID:      0x000000000003B59E    (64 bit uint)
>                  attributeID:        0x0011    (16 bit uint)
>                  rsv0:               0x0000    (16 bit uint)
>                  attributeModifier:  0x00000000    (32 bit uint)
>              }
>              RMPPVersion:        0x01    (8 bit uint)
>              RMPPType:           0x01    (8 bit uint)
>              RRespTime:          0x00    (5 bit uint)
>              RMPPFlags:          0x3    (3 bit uint)
>              RMPPStatus:         0x00    (8 bit uint)
>              data1:              0x00000001    (32 bit uint)
>              data2:              0x000002F0    (32 bit uint)
>          }
>          SMKey:              0x0000000000000000    (64 bit uint)
>          attributeOffset:    0x000E    (16 bit uint)
>          rsv0:               0x0000    (16 bit uint)
>          componentMask:      0x0000000000000000    (64 bit uint)
>      }

This is a first DATA packet (with more to come).

> The application acks with a window size of 1:

>      sending MAD:
>      unrecognized MAD class attribute: 0x0000
>      MAD:                struct SAHeader (56 bytes) -
>                          SA Header (section 15.2.1.1)
>      {
>          RMPPHeader:         struct RMPPHeader (36 bytes) -
>                              RMPP Header Fields (section 13.6.2.1)
>          {
>              MADHeader:          struct MADHeader (24 bytes) -
>                                  MAD Base Header (section 13.4.3)
>              {
>                  baseVersion:        0x01    (8 bit uint)
>                  mgmtClass:          0x03    (8 bit uint)
>                  classVersion:       0x00    (8 bit uint)
>                  method:             0x00    (8 bit uint)
>                  status:             0x0000    (16 bit uint)
>                  classSpecific:      0x0000    (16 bit uint)
>                  transactionID:      0x000000000003B59E    (64 bit uint)
>                  attributeID:        0x0000    (16 bit uint)
>                  rsv0:               0x0000    (16 bit uint)
>                  attributeModifier:  0x00000000    (32 bit uint)
>              }
>              RMPPVersion:        0x01    (8 bit uint)
>              RMPPType:           0x02    (8 bit uint)
>              RRespTime:          0x14    (5 bit uint)
>              RMPPFlags:          0x1    (3 bit uint)

FYI, this doesn't need setting for ACKs.

>              RMPPStatus:         0x00    (8 bit uint)
>              data1:              0x00000001    (32 bit uint)
>              data2:              0x00000002    (32 bit uint)
>          }
>          SMKey:              0x0000000000000000    (64 bit uint)
>          attributeOffset:    0x0000    (16 bit uint)
>          rsv0:               0x0000    (16 bit uint)
>          componentMask:      0x0000000000000000    (64 bit uint)
>      }

That is an ACK for segment 1 (which OpenSM just sent) with a new window
last of 2. (So your SA client appears to be other than OpenIB, right ?).

The one thing wrong with it which causes it to be ignored on the OpenSM
side is that the method is not set properly. It should be
SubadmGetTable. Can you change that and retry ?

> OpenSM says that it received the initial packet again:
          ^^^^^^^^^^^^^^^^^^^^
          sends

>      received MAD:
>      struct HdrLRH (8 bytes) -
>                          Local Route Header (section 7.7)
>      {
>          VL:                 0x0    (4 bit uint)
>          LVer:               0x0    (4 bit uint)
>          SL:                 0x0    (4 bit uint)
>          rsv0:               0x0    (2 bit uint)
>          LNH:                0x2    (2 bit uint)
>          DLID:               0x0001    (16 bit uint)
>          rsv1:               0x00    (5 bit uint)
>          pktLen:             0x048    (11 bit uint)
>          SLID:               0x0002    (16 bit uint)
>      }
>      MAD:                struct SAHeader (56 bytes) -
>                          SA Header (section 15.2.1.1)
>      {
>          RMPPHeader:         struct RMPPHeader (36 bytes) -
>                              RMPP Header Fields (section 13.6.2.1)
>          {
>              MADHeader:          struct MADHeader (24 bytes) -
>                                  MAD Base Header (section 13.4.3)
>              {
>                  baseVersion:        0x01    (8 bit uint)
>                  mgmtClass:          0x03    (8 bit uint)
>                  classVersion:       0x02    (8 bit uint)
>                  method:             0x92    (8 bit uint)
>                  status:             0x0000    (16 bit uint)
>                  classSpecific:      0x0000    (16 bit uint)
>                  transactionID:      0x000000000003B59E    (64 bit uint)
>                  attributeID:        0x0011    (16 bit uint)
>                  rsv0:               0x0000    (16 bit uint)
>                  attributeModifier:  0x00000000    (32 bit uint)
>              }
>              RMPPVersion:        0x01    (8 bit uint)
>              RMPPType:           0x01    (8 bit uint)
>              RRespTime:          0x00    (5 bit uint)
>              RMPPFlags:          0x3    (3 bit uint)
>              RMPPStatus:         0x00    (8 bit uint)
>              data1:              0x00000001    (32 bit uint)
>              data2:              0x000002F0    (32 bit uint)
>          }
>          SMKey:              0x0000000000000000    (64 bit uint)
>          attributeOffset:    0x000E    (16 bit uint)
>          rsv0:               0x0000    (16 bit uint)
>          componentMask:      0x0000000000000000    (64 bit uint)
>      }

This is probably on a timeout as the OpenSM side has not received a
proper ACK.

Note that it is the kernel RMPP code which handles the most of what is
being discussed here.

-- Hal

> Our application acks once again:
> 
> sending MAD:
> unrecognized MAD class attribute: 0x0000
> MAD:                struct SAHeader (56 bytes) -
>                      SA Header (section 15.2.1.1)
> {
>      RMPPHeader:         struct RMPPHeader (36 bytes) -
>                          RMPP Header Fields (section 13.6.2.1)
>      {
>          MADHeader:          struct MADHeader (24 bytes) -
>                              MAD Base Header (section 13.4.3)
>          {
>              baseVersion:        0x01    (8 bit uint)
>              mgmtClass:          0x03    (8 bit uint)
>              classVersion:       0x00    (8 bit uint)
>              method:             0x00    (8 bit uint)
>              status:             0x0000    (16 bit uint)
>              classSpecific:      0x0000    (16 bit uint)
>              transactionID:      0x000000000003B59E    (64 bit uint)
>              attributeID:        0x0000    (16 bit uint)
>              rsv0:               0x0000    (16 bit uint)
>              attributeModifier:  0x00000000    (32 bit uint)
>          }
>          RMPPVersion:        0x01    (8 bit uint)
>          RMPPType:           0x02    (8 bit uint)
>          RRespTime:          0x14    (5 bit uint)
>          RMPPFlags:          0x1    (3 bit uint)
>          RMPPStatus:         0x00    (8 bit uint)
>          data1:              0x00000001    (32 bit uint)
>          data2:              0x00000002    (32 bit uint)
>      }
>      SMKey:              0x0000000000000000    (64 bit uint)
>      attributeOffset:    0x0000    (16 bit uint)
>      rsv0:               0x0000    (16 bit uint)
>      componentMask:      0x0000000000000000    (64 bit uint)
> }
> 
> The RMPP sequence times out, our application sends another
> subnAdmGetTable(SANodeRecord), ad infinitum.
> 
> Cheers :-),
> Roel.




More information about the general mailing list