[ewg] OPENSM cONFIGURATION

Hal Rosenstock hal at dev.mellanox.co.il
Sat Apr 12 08:38:30 PDT 2014


On 4/12/2014 11:29 AM, Atul Yadav wrote:
> Hi,
> 
> Yes, i am able to ping all the nodes connected with Infiniband switch 
> For more details please go through the attachment.

OpenSM looks fine although it is very old (3.3.5). Is this SM host based
or embedded in one of your switches ?

I didn't see any output related to showing pings working but I'll take
your word for this. If pings work, I have no theory why this wouldn't work.

-- Hal

> 
> 
> 
> Thanks
> Atul Yadav
> 
> 
> On Sat, Apr 12, 2014 at 7:28 PM, Hal Rosenstock <hal at dev.mellanox.co.il
> <mailto:hal at dev.mellanox.co.il>> wrote:
> 
>     On 4/12/2014 6:59 AM, Atul Yadav wrote:
>     > HI,
>     >
>     > Thanks for replying
>     > In this artectuire, when we are doing ibv_rc_pingpong between two
>     nodes
>     > connected with same switch we are getting result. But when we use two
>     > nodes with 2 switches we are getting error.
>     >
>     > Success:-
>     > [root at oss1 ~]# ibv_rc_pingpong
>     >   local address:  LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::
>     >   remote address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::
>     > 8192000 bytes in 0.01 seconds = 6992.74 Mbit/sec
>     > 1000 iters in 0.01 seconds = 9.37 usec/iter
>     > [root at oss1 ~]#
>     >
>     > [root at mds1 ~]# ibv_rc_pingpong 173.16.1.52
>     >   local address:  LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::
>     >   remote address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::
>     > 8192000 bytes in 0.01 seconds = 7084.97 Mbit/sec
>     > 1000 iters in 0.01 seconds = 9.25 usec/iter
>     > [root at mds1 ~]#
>     >
>     >
>     >
>     >
>     > Error
>     > [root at nalanda mvapich2-1.9]# ibv_rc_pingpong
>     >   local address:  LID 0x0001, QPN 0x56004e, PSN 0x704d51
>     >   remote address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2
>     >
>     > [root at mds1 ~]# ibv_rc_pingpong 173.16.1.1
>     >   local address:  LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2, GID ::
>     > client read: Success
>     > Couldn't read remote address
>     > [root at mds1 ~]#
> 
>     Looking at libibverbs/examples/rc_pingpong.c:
> 
>     static struct pingpong_dest *pp_client_exch_dest(const char
>     *servername, int port,
>                                                      const struct
>     pingpong_dest *my_dest)
>     {
>     ...
>             gid_to_wire_gid(&my_dest->gid, gid);
>             sprintf(msg, "%04x:%06x:%06x:%s", my_dest->lid, my_dest->qpn,
>                                                            
>     my_dest->psn, gid);
>             if (write(sockfd, msg, sizeof msg) != sizeof msg) {
>                     fprintf(stderr, "Couldn't send local address\n");
>                     goto out;
>             }
> 
> 
>             if (read(sockfd, msg, sizeof msg) != sizeof msg) {
>                     perror("client read");
>                     fprintf(stderr, "Couldn't read remote address\n");
>                     goto out;
>             }
> 
>     This read is failing for some reason. This is some message exchange
>     over some IP network (for example, IPoIB or ethernet).
> 
>     >
>     > And how we test our ftree topology is working fine.
>     >
>     > Please go through the attachment.
> 
>     Looks like LIDs are assigned but can't tell about routing from info
>     supplied but topology looks relatively simple (5 switches,
>     homogenous 4x QDR links). Is the OpenSM log clean ? Any fat tree
>     related messages. This is likely not SM issue.
> 
>     The next issues are end node related (probably with IPoIB
>     configuration). Can you ping between the nodes which fail
>     rc_pingpong ? If not,
> 
>     -- Hal
> 
>     >
>     > Thank You
>     > Atul Yadav
>     >
>     >
>     > On Sat, Apr 12, 2014 at 12:14 AM, Hal Rosenstock
>     <hal at dev.mellanox.co.il <mailto:hal at dev.mellanox.co.il>
>     > <mailto:hal at dev.mellanox.co.il <mailto:hal at dev.mellanox.co.il>>>
>     wrote:
>     >
>     >     On 4/11/2014 2:21 PM, Atul Yadav wrote:
>     >     > Dear Team,
>     >     >
>     >     > We are trying to build Fat tree topology.
>     >     > The details are given below:
>     >     > Unmanaged switches 36 port  quantity 5
>     >     > As per the some blog we need to modify the opensm.conf file
>     >     > But we are unable to identify some parameter like:-
>     >     >  root_guid_file    ???????
>     >
>     >     Fat tree routing will try to autodetect the roots but this may not
>     >     work and it is better to specify the root GUIDs. In your case,
>     they
>     >     are the GUIDs for switches A and B.
>     >
>     >     The root GUID file is then provided to OpenSM either via the conf
>     >     file or command line parameters. The command line parameter is
>     [-a |
>     >            --root_guid_file <path to file>]
>     >
>     >     OpenSM man page says:
>     >
>     >            -a, --root_guid_file <file name>
>     >                   Set the root nodes for the Up/Down or Fat-Tree
>     routing
>     >     algorithm
>     >                   to the guids provided in the given file (one to
>     a line).
>     >
>     >     It also says:
>     >
>     >            If the root guid file  is  not  provided  (?-a?  or
>     >      ?--root_guid_file?
>     >            options),  the  topology has to be pure fat-tree that
>     >     complies with the
>     >            following rules:
>     >              - Tree rank should be between two and eight (inclusively)
>     >              - Switches of the same rank should have the same number
>     >                of UP-going port groups*, unless they are root
>     switches,
>     >                in which case the shouldn?t have UP-going ports at all.
>     >              - Switches of the same rank should have the same number
>     >                of DOWN-going port groups, unless they are leaf
>     switches.
>     >              - Switches of the same rank should have the same number
>     >                of ports in each UP-going port group.
>     >              - Switches of the same rank should have the same number
>     >                of ports in each DOWN-going port group.
>     >              - All the CAs have to be at the same tree level (rank).
>     >
>     >            If the root guid file is provided, the topology doesn?t
>     have
>     >     to be pure
>     >            fat-tree, and it should only comply with the following
>     rules:
>     >              - Tree rank should be between two and eight (inclusively)
>     >              - All the Compute Nodes** have to be at the same tree
>     level
>     >     (rank).
>     >                Note that non-compute node CAs are allowed here to
>     be at
>     >     different
>     >                tree ranks.
>     >
>     >            *  ports that are connected to the same remote switch are
>     >     referenced as
>     >            port group.
>     >
>     >            **  list  of  compute  nodes  (CNs)  can  be  specified  by
>     >     -u   or
>     >            --cn_guid_file OpenSM options.
>     >
>     >     -- Hal
>     >
>     >     >
>     >     > Need your input for this ?
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > Thank You
>     >     > Atul Yadav
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > ewg mailing list
>     >     > ewg at lists.openfabrics.org <mailto:ewg at lists.openfabrics.org>
>     <mailto:ewg at lists.openfabrics.org <mailto:ewg at lists.openfabrics.org>>
>     >     > http://lists.openfabrics.org/mailman/listinfo/ewg
>     >
>     >
> 
> 




More information about the ewg mailing list