[openib-general] opensm and faulty hardware
Hal Rosenstock
halr at voltaire.com
Tue Sep 27 11:21:06 PDT 2005
Hi Viswa,
On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote:
> I tracked down the issue to a bug in osm_lid_mgr.c
>
> function: __osm_lid_mgr_init_sweep(...)
>
> The bad hardware was retutning an assigned LID of 0xFFFF. In this
> function there is a loop
> as follows where opensm is getting stuck.. (with line number)
>
> 392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
> 393
> 394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl );
> 395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl );
> 396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item )
> )
> 397 {
> 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid,
> &disc_max_lid);
> 399 for (lid = disc_min_lid; lid <= disc_max_lid;
> lid++) <===== Bug here
> 400 cl_ptr_vector_set(p_discovered_vec, lid, p_port );
> 401 }
>
> Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are
> unsigned 16 bit numbers, the condition
0xFFFF is the permissive LID and not LID routed. In fact, unicast LIDs
should be between 0x0001 and 0xbfff.
So I think the fix involves not allowing min/max to be set that way.
> in the for loop never becomes false, and opensm is stuck in the loop.
> There are couple of other places in that
> function that needs fixing too.
What are the other places you see ?
Thanks.
-- Hal
More information about the general
mailing list