[ofa-general] ***SPAM*** new with IB
Hal Rosenstock
halr at obsidianresearch.com
Wed Aug 27 16:07:53 PDT 2008
Podar Valentin wrote:
> Dear all,
> I am new to this field and I have some questions.
> I run a cluster with IB Mellanox. I have two subnets each with its own opensm (running on different port P1 or P2).
Are these opensms on the same machine ? Are the two subnets physically
separate (no connections between the two in terms of switches) ?
> mixed hardware MT25204 and MT23108. all are at 4x rate.
> mixed drivers IBGold1.8.2 and MLNX_OFED_LINUX-1.3.1-rhel4
> when I issue
> ibdiagnet -p 1 -lw 4x I get
> -I- Stages Status Report:
> STAGE Errors Warnings
> Bad GUIDs/LIDs Check 0 0
> Link State Active Check 0 0
> Performance Counters Report 0 0
> Specific Link Width Check 0 0
> Partitions Check 0 0
> IPoIB Subnets Check 0 16
> BUT
> -I---------------------------------------------------
> -I- IPoIB Subnets Check
> -I---------------------------------------------------
> -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:4096Byte rate:120Gbps SL:0x00
> -W- Port h1/P1 lid=0x0011 guid=0x dev=23108 can not join due to rate:10Gbps < group:120Gbps
> -W- Port h2/P1 lid=0x0321 guid=0x dev=23108 can not join due to rate:10Gbps < group:120Gbps
> -W- Port h3/P1 lid=0x0069 guid=0x dev=23108 can not join due to rate:10Gbps < group:120Gbps
> -W- Port h4/P1 lid=0x0010 guid=0x dev=23108 can not join due to rate:10Gbps < group:120Gbps
> and so on with all the nodes.
> switch type MT47396.
>
> I think the problem is this line
> Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:4096Byte rate:120Gbps SL:0x00
> but I don't know how to set the Mtu to 2048 and rate to 10G.
>
You need to set mtu to 4 and rate to 3 to do that. There's a partitions
file which defines this with examples. It depends on what version of
OpenSM you are running and how it's configured.
On another note, I don't understand how you got MC groups with these
parameters. Are you running IPoIB-CM ? What switches do you have ?
-- Hal
> more
> /usr/sbin/saquery -d -g
>
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x00000000ffffffff
> Mlid....................0xC000
> Mtu.....................0x5
> pkey....................0xFFFF
> Rate....................0xA
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x0000000000000001
> Mlid....................0xC001
> Mtu.....................0x5
> pkey....................0xFFFF
> Rate....................0xA
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x0000000000656565
> Mlid....................0xC002
> Mtu.....................0x4
> pkey....................0xFFFF
> Rate....................0x3
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x0000000000a847ff
> Mlid....................0xC003
> Mtu.....................0x4
> pkey....................0xFFFF
> Rate....................0x3
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x0000000000000000
> Mlid....................0xC007
> Mtu.....................0x4
> pkey....................0xFFFF
> Rate....................0x2
> MCMemberRecord group dump:
> MGID....................0xff12401bffff0000 : 0x000000000202c902
> Mlid....................0xC008
> Mtu.....................0x4
> pkey....................0xFFFF
> Rate....................0x2
>
> the question is how can I set the group rate to 10G and not 120G? and the group MTU to 2048 as on some nodes I get " failed to join multicast or setting MTU>4096 will ...generate...some errors"
>
> thank you very much!
> Vali
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list