[Users] IB Partitioning
Jesper Larsen
JLA at fcoo.dk
Tue Sep 18 01:24:06 PDT 2012
Hi David
Thanks for the help. I have now configured a subinterface. And I have tested that it is used when pinging one of my nodes. Unfortunately it only works in this opensm subnet manager configuration:
#Default=0x7fff, rate=7, defmember=limited: ALL;
Default=0x7fff, ipoib, rate=7: ALL=full;
DevNet=0x0001, ipoib, rate=7, defmember=full: 0x0002c903004ef895, 0x78e7d1030023ffd5, 0x78e7d1030023fdfd, 0x78e7d10300239885, 0x78e7d1030023ff7d, 0x78e7d1030021ecad, 0x78e7d10300245fa5, 0x78e7d1030024578d;
And not when I use the line which is commented out instead of the second line. I have checked the addresses in the DevNet and they are OK. And the pkeys seems to check out (I assume it is fine with the 8 instead of 0):
[root at dn003 ~]# cat /sys/class/net/ib0.8001/pkey
0x8001
[root at bifrost ~]# cat /sys/class/net/ib0.8001/pkey
0x8001
My routing table is:
[root at bifrost ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.128.0 * 255.255.255.0 U 0 0 0 ib0.8001
192.168.0.0 * 255.255.255.0 U 0 0 0 eth0
192.168.127.0 * 255.255.255.0 U 0 0 0 ib0
10.230.80.0 * 255.255.255.0 U 0 0 0 eth4
link-local * 255.255.0.0 U 1002 0 0 eth0
link-local * 255.255.0.0 U 1006 0 0 eth4
link-local * 255.255.0.0 U 1013 0 0 ib0
link-local * 255.255.0.0 U 1015 0 0 ib0.8001
default 10.230.80.3 0.0.0.0 UG 0 0 0 eth4
[root at bifrost ~]# ping 192.168.128.103
PING 192.168.128.103 (192.168.128.103) 56(84) bytes of data.
^C
--- 192.168.128.103 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2974ms
So it should use ib0.8001 for the ping command. Any hints on what I am doing wrong?
I have also tried to look for tools that can show/sniff which pkey is actually used in a communication. But I haven't been able to figure it out what to use (I looked at smpdump and smpquery). Any pointers?
Best regards,
Jesper
On Fri, Sep 14, 2012 at 4:21 AM, Jesper Larsen <JLA at fcoo.dk<mailto:JLA at fcoo.dk>> wrote:
Dear OFED users,
We have a cluster connected by a single IB switch which we want to split into two separate partitions (development/test and production). The partitions must not be able to talk to each other since we do not want errors happening on the development partition to be able to affect the production partition. I have therefore tried to make a partition configuration file for opensm:
/etc/opensm/partitions.conf
which at this point (just testing if I can make it work for a single non-default partition) looks something like this:
Default=0xffff, rate=7, defmember=limited: ALL;
DevNet=0x0001, ipoib, rate=7, defmember=full: 0x0002c903004ef895, 0x78e7d1030023ffd5, 0x78e7d1030023fdfd, 0x78e7d10300239885, 0x78e7d1030023ff7d, 0x78e7d1030021ecad, 0x78e7d10300245fa5, 0x78e7d1030024578d;
So I have made all the members of the default group limited so that they cannot talk together. The reason is that all nodes are members of the default group and that cannot as far as I understand be changed. And I have made a new group containing only development nodes. But when I try to ping one development node from another (over ib0) it cannot find it (Destination Host Unreachable). And when I look at at the two development nodes I see:
# cat /sys/class/net/ib0/pkey
0xffff
This will always show 0xffff since ib0 is always on the default partition.
Your 0x0001 partition will be used by ib0.8001 since IB partitions look like VLANs in the network interface world.
If you want to know what pkeys your HCA has, look at something like /sys/class/infiniband/*/ports/1/pkeys/*
Dave
Does this mean that it is only trying the default network? And if yes, how do I make it try the DevNet?
Best regards,
Jesper
_______________________________________________
Users mailing list
Users at lists.openfabrics.org<mailto:Users at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20120918/66a71b7d/attachment.html>
More information about the Users
mailing list