[ofa-general] ***SPAM*** new with IB

Hal Rosenstock halr at obsidianresearch.com
Wed Aug 27 16:07:53 PDT 2008


Podar Valentin wrote:
> Dear all,
> I am new to this field and I have some questions.
> I run a cluster with IB Mellanox. I have two subnets each with its own opensm (running on different port P1 or P2). 
Are these opensms on the same machine ? Are the two subnets physically 
separate (no connections between the two in terms of switches) ?
> mixed hardware MT25204 and MT23108. all are at 4x rate.
> mixed drivers IBGold1.8.2 and MLNX_OFED_LINUX-1.3.1-rhel4
> when I issue 
> ibdiagnet -p 1 -lw 4x I get 
> -I- Stages Status Report:
>     STAGE                                    Errors Warnings
>     Bad GUIDs/LIDs Check                     0      0
>     Link State Active Check                  0      0
>     Performance Counters Report              0      0
>     Specific Link Width Check                0      0
>     Partitions Check                         0      0
>     IPoIB Subnets Check                      0      16
> BUT
> -I---------------------------------------------------
> -I- IPoIB Subnets Check
> -I---------------------------------------------------
> -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:4096Byte rate:120Gbps SL:0x00
> -W- Port h1/P1 lid=0x0011 guid=0x dev=23108 can not join due     to rate:10Gbps < group:120Gbps
> -W- Port h2/P1 lid=0x0321 guid=0x dev=23108 can not join    due to rate:10Gbps < group:120Gbps
> -W- Port h3/P1 lid=0x0069 guid=0x dev=23108 can not join due to    rate:10Gbps < group:120Gbps
> -W- Port h4/P1 lid=0x0010 guid=0x dev=23108 can not    join due to rate:10Gbps < group:120Gbps
> and so on with all the nodes.
> switch type MT47396.
>
> I think the problem is  this line 
> Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:4096Byte rate:120Gbps SL:0x00
> but I don't know how to set the Mtu to 2048 and rate to 10G.
>   
You need to set mtu to 4 and rate to 3 to do that. There's a partitions 
file which defines this with examples. It depends on what version of 
OpenSM you are running and how it's configured.

On another note, I don't understand how you got MC groups with these 
parameters. Are you running IPoIB-CM ? What switches do you have ?

-- Hal
> more
>  /usr/sbin/saquery -d -g
>
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x00000000ffffffff
>                 Mlid....................0xC000
>                 Mtu.....................0x5
>                 pkey....................0xFFFF
>                 Rate....................0xA
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x0000000000000001
>                 Mlid....................0xC001
>                 Mtu.....................0x5
>                 pkey....................0xFFFF
>                 Rate....................0xA
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x0000000000656565
>                 Mlid....................0xC002
>                 Mtu.....................0x4
>                 pkey....................0xFFFF
>                 Rate....................0x3
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x0000000000a847ff
>                 Mlid....................0xC003
>                 Mtu.....................0x4
>                 pkey....................0xFFFF
>                 Rate....................0x3
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x0000000000000000
>                 Mlid....................0xC007
>                 Mtu.....................0x4
>                 pkey....................0xFFFF
>                 Rate....................0x2
> MCMemberRecord group dump:
>                 MGID....................0xff12401bffff0000 : 0x000000000202c902
>                 Mlid....................0xC008
>                 Mtu.....................0x4
>                 pkey....................0xFFFF
>                 Rate....................0x2
>
> the question is how can I set the group rate to 10G and not 120G? and the group MTU to 2048 as on some nodes I get " failed to join multicast or setting  MTU>4096 will ...generate...some errors"
>
> thank you very much!
> Vali
>
>
>
>       
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list