[ewg] OPENSM cONFIGURATION
Hal Rosenstock
hal at dev.mellanox.co.il
Sat Apr 12 08:38:30 PDT 2014
On 4/12/2014 11:29 AM, Atul Yadav wrote:
> Hi,
>
> Yes, i am able to ping all the nodes connected with Infiniband switch
> For more details please go through the attachment.
OpenSM looks fine although it is very old (3.3.5). Is this SM host based
or embedded in one of your switches ?
I didn't see any output related to showing pings working but I'll take
your word for this. If pings work, I have no theory why this wouldn't work.
-- Hal
>
>
>
> Thanks
> Atul Yadav
>
>
> On Sat, Apr 12, 2014 at 7:28 PM, Hal Rosenstock <hal at dev.mellanox.co.il
> <mailto:hal at dev.mellanox.co.il>> wrote:
>
> On 4/12/2014 6:59 AM, Atul Yadav wrote:
> > HI,
> >
> > Thanks for replying
> > In this artectuire, when we are doing ibv_rc_pingpong between two
> nodes
> > connected with same switch we are getting result. But when we use two
> > nodes with 2 switches we are getting error.
> >
> > Success:-
> > [root at oss1 ~]# ibv_rc_pingpong
> > local address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::
> > remote address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::
> > 8192000 bytes in 0.01 seconds = 6992.74 Mbit/sec
> > 1000 iters in 0.01 seconds = 9.37 usec/iter
> > [root at oss1 ~]#
> >
> > [root at mds1 ~]# ibv_rc_pingpong 173.16.1.52
> > local address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::
> > remote address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::
> > 8192000 bytes in 0.01 seconds = 7084.97 Mbit/sec
> > 1000 iters in 0.01 seconds = 9.25 usec/iter
> > [root at mds1 ~]#
> >
> >
> >
> >
> > Error
> > [root at nalanda mvapich2-1.9]# ibv_rc_pingpong
> > local address: LID 0x0001, QPN 0x56004e, PSN 0x704d51
> > remote address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2
> >
> > [root at mds1 ~]# ibv_rc_pingpong 173.16.1.1
> > local address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2, GID ::
> > client read: Success
> > Couldn't read remote address
> > [root at mds1 ~]#
>
> Looking at libibverbs/examples/rc_pingpong.c:
>
> static struct pingpong_dest *pp_client_exch_dest(const char
> *servername, int port,
> const struct
> pingpong_dest *my_dest)
> {
> ...
> gid_to_wire_gid(&my_dest->gid, gid);
> sprintf(msg, "%04x:%06x:%06x:%s", my_dest->lid, my_dest->qpn,
>
> my_dest->psn, gid);
> if (write(sockfd, msg, sizeof msg) != sizeof msg) {
> fprintf(stderr, "Couldn't send local address\n");
> goto out;
> }
>
>
> if (read(sockfd, msg, sizeof msg) != sizeof msg) {
> perror("client read");
> fprintf(stderr, "Couldn't read remote address\n");
> goto out;
> }
>
> This read is failing for some reason. This is some message exchange
> over some IP network (for example, IPoIB or ethernet).
>
> >
> > And how we test our ftree topology is working fine.
> >
> > Please go through the attachment.
>
> Looks like LIDs are assigned but can't tell about routing from info
> supplied but topology looks relatively simple (5 switches,
> homogenous 4x QDR links). Is the OpenSM log clean ? Any fat tree
> related messages. This is likely not SM issue.
>
> The next issues are end node related (probably with IPoIB
> configuration). Can you ping between the nodes which fail
> rc_pingpong ? If not,
>
> -- Hal
>
> >
> > Thank You
> > Atul Yadav
> >
> >
> > On Sat, Apr 12, 2014 at 12:14 AM, Hal Rosenstock
> <hal at dev.mellanox.co.il <mailto:hal at dev.mellanox.co.il>
> > <mailto:hal at dev.mellanox.co.il <mailto:hal at dev.mellanox.co.il>>>
> wrote:
> >
> > On 4/11/2014 2:21 PM, Atul Yadav wrote:
> > > Dear Team,
> > >
> > > We are trying to build Fat tree topology.
> > > The details are given below:
> > > Unmanaged switches 36 port quantity 5
> > > As per the some blog we need to modify the opensm.conf file
> > > But we are unable to identify some parameter like:-
> > > root_guid_file ???????
> >
> > Fat tree routing will try to autodetect the roots but this may not
> > work and it is better to specify the root GUIDs. In your case,
> they
> > are the GUIDs for switches A and B.
> >
> > The root GUID file is then provided to OpenSM either via the conf
> > file or command line parameters. The command line parameter is
> [-a |
> > --root_guid_file <path to file>]
> >
> > OpenSM man page says:
> >
> > -a, --root_guid_file <file name>
> > Set the root nodes for the Up/Down or Fat-Tree
> routing
> > algorithm
> > to the guids provided in the given file (one to
> a line).
> >
> > It also says:
> >
> > If the root guid file is not provided (?-a? or
> > ?--root_guid_file?
> > options), the topology has to be pure fat-tree that
> > complies with the
> > following rules:
> > - Tree rank should be between two and eight (inclusively)
> > - Switches of the same rank should have the same number
> > of UP-going port groups*, unless they are root
> switches,
> > in which case the shouldn?t have UP-going ports at all.
> > - Switches of the same rank should have the same number
> > of DOWN-going port groups, unless they are leaf
> switches.
> > - Switches of the same rank should have the same number
> > of ports in each UP-going port group.
> > - Switches of the same rank should have the same number
> > of ports in each DOWN-going port group.
> > - All the CAs have to be at the same tree level (rank).
> >
> > If the root guid file is provided, the topology doesn?t
> have
> > to be pure
> > fat-tree, and it should only comply with the following
> rules:
> > - Tree rank should be between two and eight (inclusively)
> > - All the Compute Nodes** have to be at the same tree
> level
> > (rank).
> > Note that non-compute node CAs are allowed here to
> be at
> > different
> > tree ranks.
> >
> > * ports that are connected to the same remote switch are
> > referenced as
> > port group.
> >
> > ** list of compute nodes (CNs) can be specified by
> > -u or
> > --cn_guid_file OpenSM options.
> >
> > -- Hal
> >
> > >
> > > Need your input for this ?
> > >
> > >
> > >
> > >
> > > Thank You
> > > Atul Yadav
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ewg mailing list
> > > ewg at lists.openfabrics.org <mailto:ewg at lists.openfabrics.org>
> <mailto:ewg at lists.openfabrics.org <mailto:ewg at lists.openfabrics.org>>
> > > http://lists.openfabrics.org/mailman/listinfo/ewg
> >
> >
>
>
More information about the ewg
mailing list