I tracked down the issue to a bug in osm_lid_mgr.c <br>
<br>
function: __osm_lid_mgr_init_sweep(...)<br>
<br>
The bad hardware was retutning an assigned LID of 0xFFFF. In this function there is a loop<br>
as follows where opensm is getting stuck.. (with line number)<br>
<br>
392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;<br>
393<br>
394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl );<br>
395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl );<br>
396 p_port
= (osm_port_t*)cl_qmap_next( &p_port->map_item ) )<br>
397 {<br>
398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid);<br>
399 for (lid = disc_min_lid;
lid <= disc_max_lid;
lid++)
<===== Bug here<br>
400 cl_ptr_vector_set(p_discovered_vec, lid, p_port );<br>
401 }<br>
<br>
Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are unsigned 16 bit numbers, the condition<br>
in the for loop never becomes false, and opensm is stuck in the loop. There are couple of other places in that<br>
function that needs fixing too.<br>
<br>
-Viswa<br>
<br><br><div><span class="gmail_quote">On 9/27/05, <b class="gmail_sendername">Viswanath Krishnamurthy</b> <<a href="mailto:viswa.krish@gmail.com">viswa.krish@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Log sent off-list...<br>
<br>
-Viswa<br>
<br><br><div><span class="q"><span class="gmail_quote">On 9/27/05, <b class="gmail_sendername">Eitan Zahavi</b> <<a href="mailto:eitan@mellanox.co.il" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
eitan@mellanox.co.il</a>> wrote:</span></span><div><span class="e" id="q_106989380cb75c01_2"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Viswa,<br><br>Please send a full /var/log/osm.log file of opensm -V .<br>You can send us a copy off the list if it is too big:<br><br>yael and eitan in @<a href="http://mellanox.co.il" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
mellanox.co.il</a><br><br>EZ<br><br>
Hal Rosenstock wrote:<br>> On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote:<br>><br>>>I have an exerciser in the IB network. The exerciser seems to be<br>>>faulty/buggy. When opensm starts I do not
<br>>>see 'SUBNET UP" message. It says "Entering MASTER" and waits there.<br>>>Any new node inserted in this state is not assigned any LID. Anybody<br>>>seen such behavior ?<br>><br>
><br>> Any idea on how the IB exerciser misbehaves on the network ? Do you have<br>> an analyzer too ?<br>><br>> What does the OSM log show ?<br>><br>> -- Hal<br>><br>> _______________________________________________
<br>> openib-general mailing list<br>> <a href="mailto:openib-general@openib.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">openib-general@openib.org</a><br>> <a href="http://openib.org/mailman/listinfo/openib-general" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
http://openib.org/mailman/listinfo/openib-general
</a><br>><br>> To unsubscribe, please visit<br>> <a href="http://openib.org/mailman/listinfo/openib-general" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://openib.org/mailman/listinfo/openib-general
</a><br>><br><br></blockquote></span></div></div><br>
</blockquote></div><br>