[openib-general] opensm and faulty hardware

Viswanath Krishnamurthy viswa.krish at gmail.com
Tue Sep 27 11:13:31 PDT 2005


I tracked down the issue to a bug in osm_lid_mgr.c

function: __osm_lid_mgr_init_sweep(...)

The bad hardware was retutning an assigned LID of 0xFFFF. In this function
there is a loop
as follows where opensm is getting stuck.. (with line number)

392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
393
394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl );
395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl );
396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) )
397 {
398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid);
399 for (lid = disc_min_lid; lid <= disc_max_lid; lid++) <===== Bug here
400 cl_ptr_vector_set(p_discovered_vec, lid, p_port );
401 }

Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are unsigned
16 bit numbers, the condition
in the for loop never becomes false, and opensm is stuck in the loop. There
are couple of other places in that
function that needs fixing too.

-Viswa


On 9/27/05, Viswanath Krishnamurthy <viswa.krish at gmail.com> wrote:
>
> Log sent off-list...
>
> -Viswa
>
>
> On 9/27/05, Eitan Zahavi <eitan at mellanox.co.il> wrote:
> >
> > Hi Viswa,
> >
> > Please send a full /var/log/osm.log file of opensm -V .
> > You can send us a copy off the list if it is too big:
> >
> > yael and eitan in @mellanox.co.il <http://mellanox.co.il>
> >
> > EZ
> >
> > Hal Rosenstock wrote:
> > > On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote:
> > >
> > >>I have an exerciser in the IB network. The exerciser seems to be
> > >>faulty/buggy. When opensm starts I do not
> > >>see 'SUBNET UP" message. It says "Entering MASTER" and waits there.
> > >>Any new node inserted in this state is not assigned any LID. Anybody
> > >>seen such behavior ?
> > >
> > >
> > > Any idea on how the IB exerciser misbehaves on the network ? Do you
> > have
> > > an analyzer too ?
> > >
> > > What does the OSM log show ?
> > >
> > > -- Hal
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050927/55607220/attachment.html>


More information about the general mailing list