<div dir="ltr">Hi,<div><br></div><div>Yes, i am able to ping all the nodes connected with Infiniband switch </div><div>For more details please go through the attachment.</div><div><br></div><div><br></div><div><br></div><div>
Thanks</div><div>Atul Yadav</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Apr 12, 2014 at 7:28 PM, Hal Rosenstock <span dir="ltr"><<a href="mailto:hal@dev.mellanox.co.il" target="_blank">hal@dev.mellanox.co.il</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 4/12/2014 6:59 AM, Atul Yadav wrote:<br>
> HI,<br>
><br>
> Thanks for replying<br>
> In this artectuire, when we are doing ibv_rc_pingpong between two nodes<br>
> connected with same switch we are getting result. But when we use two<br>
> nodes with 2 switches we are getting error.<br>
><br>
> Success:-<br>
> [root@oss1 ~]# ibv_rc_pingpong<br>
> local address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::<br>
> remote address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::<br>
> 8192000 bytes in 0.01 seconds = 6992.74 Mbit/sec<br>
> 1000 iters in 0.01 seconds = 9.37 usec/iter<br>
> [root@oss1 ~]#<br>
><br>
> [root@mds1 ~]# ibv_rc_pingpong 173.16.1.52<br>
> local address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID ::<br>
> remote address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID ::<br>
> 8192000 bytes in 0.01 seconds = 7084.97 Mbit/sec<br>
> 1000 iters in 0.01 seconds = 9.25 usec/iter<br>
> [root@mds1 ~]#<br>
><br>
><br>
><br>
><br>
> Error<br>
> [root@nalanda mvapich2-1.9]# ibv_rc_pingpong<br>
> local address: LID 0x0001, QPN 0x56004e, PSN 0x704d51<br>
> remote address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2<br>
><br>
> [root@mds1 ~]# ibv_rc_pingpong 173.16.1.1<br>
> local address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2, GID ::<br>
> client read: Success<br>
> Couldn't read remote address<br>
> [root@mds1 ~]#<br>
<br>
</div></div>Looking at libibverbs/examples/rc_pingpong.c:<br>
<br>
static struct pingpong_dest *pp_client_exch_dest(const char *servername, int port,<br>
const struct pingpong_dest *my_dest)<br>
{<br>
...<br>
gid_to_wire_gid(&my_dest->gid, gid);<br>
sprintf(msg, "%04x:%06x:%06x:%s", my_dest->lid, my_dest->qpn,<br>
my_dest->psn, gid);<br>
if (write(sockfd, msg, sizeof msg) != sizeof msg) {<br>
fprintf(stderr, "Couldn't send local address\n");<br>
goto out;<br>
}<br>
<br>
<br>
if (read(sockfd, msg, sizeof msg) != sizeof msg) {<br>
perror("client read");<br>
fprintf(stderr, "Couldn't read remote address\n");<br>
goto out;<br>
}<br>
<br>
This read is failing for some reason. This is some message exchange over some IP network (for example, IPoIB or ethernet).<br>
<div class=""><br>
><br>
> And how we test our ftree topology is working fine.<br>
><br>
> Please go through the attachment.<br>
<br>
</div>Looks like LIDs are assigned but can't tell about routing from info supplied but topology looks relatively simple (5 switches, homogenous 4x QDR links). Is the OpenSM log clean ? Any fat tree related messages. This is likely not SM issue.<br>
<br>
The next issues are end node related (probably with IPoIB configuration). Can you ping between the nodes which fail rc_pingpong ? If not,<br>
<br>
-- Hal<br>
<div class=""><br>
><br>
> Thank You<br>
> Atul Yadav<br>
><br>
><br>
> On Sat, Apr 12, 2014 at 12:14 AM, Hal Rosenstock <<a href="mailto:hal@dev.mellanox.co.il">hal@dev.mellanox.co.il</a><br>
</div><div><div class="h5">> <mailto:<a href="mailto:hal@dev.mellanox.co.il">hal@dev.mellanox.co.il</a>>> wrote:<br>
><br>
> On 4/11/2014 2:21 PM, Atul Yadav wrote:<br>
> > Dear Team,<br>
> ><br>
> > We are trying to build Fat tree topology.<br>
> > The details are given below:<br>
> > Unmanaged switches 36 port quantity 5<br>
> > As per the some blog we need to modify the opensm.conf file<br>
> > But we are unable to identify some parameter like:-<br>
> > root_guid_file ???????<br>
><br>
> Fat tree routing will try to autodetect the roots but this may not<br>
> work and it is better to specify the root GUIDs. In your case, they<br>
> are the GUIDs for switches A and B.<br>
><br>
> The root GUID file is then provided to OpenSM either via the conf<br>
> file or command line parameters. The command line parameter is [-a |<br>
> --root_guid_file <path to file>]<br>
><br>
> OpenSM man page says:<br>
><br>
> -a, --root_guid_file <file name><br>
> Set the root nodes for the Up/Down or Fat-Tree routing<br>
> algorithm<br>
> to the guids provided in the given file (one to a line).<br>
><br>
> It also says:<br>
><br>
> If the root guid file is not provided (?-a? or<br>
> ?--root_guid_file?<br>
> options), the topology has to be pure fat-tree that<br>
> complies with the<br>
> following rules:<br>
> - Tree rank should be between two and eight (inclusively)<br>
> - Switches of the same rank should have the same number<br>
> of UP-going port groups*, unless they are root switches,<br>
> in which case the shouldn?t have UP-going ports at all.<br>
> - Switches of the same rank should have the same number<br>
> of DOWN-going port groups, unless they are leaf switches.<br>
> - Switches of the same rank should have the same number<br>
> of ports in each UP-going port group.<br>
> - Switches of the same rank should have the same number<br>
> of ports in each DOWN-going port group.<br>
> - All the CAs have to be at the same tree level (rank).<br>
><br>
> If the root guid file is provided, the topology doesn?t have<br>
> to be pure<br>
> fat-tree, and it should only comply with the following rules:<br>
> - Tree rank should be between two and eight (inclusively)<br>
> - All the Compute Nodes** have to be at the same tree level<br>
> (rank).<br>
> Note that non-compute node CAs are allowed here to be at<br>
> different<br>
> tree ranks.<br>
><br>
> * ports that are connected to the same remote switch are<br>
> referenced as<br>
> port group.<br>
><br>
> ** list of compute nodes (CNs) can be specified by<br>
> -u or<br>
> --cn_guid_file OpenSM options.<br>
><br>
> -- Hal<br>
><br>
> ><br>
> > Need your input for this ?<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Thank You<br>
> > Atul Yadav<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > ewg mailing list<br>
</div></div>> > <a href="mailto:ewg@lists.openfabrics.org">ewg@lists.openfabrics.org</a> <mailto:<a href="mailto:ewg@lists.openfabrics.org">ewg@lists.openfabrics.org</a>><br>
> > <a href="http://lists.openfabrics.org/mailman/listinfo/ewg" target="_blank">http://lists.openfabrics.org/mailman/listinfo/ewg</a><br>
><br>
><br>
<br>
</blockquote></div><br></div>