[openib-general] [PATCH] Opensm - duplicated guids handling
Yael Kalka
yael at mellanox.co.il
Thu Jan 19 04:08:46 PST 2006
Hi Hal,
We've noticed that currently if we have 2 hcas with duplicated guids
connected back-2-back, opensm gets stuck. The reason for that is that
in osm_vendor_set_sm() function - the second call trying to open the
/dev/infiniband/issm%id is stuck, since this file is already open.
The following patch fixes 2 things -
1. In osm_node_info_rcv.c - we've added a case that on cases of
duplicated guids - exit (unless a flag is set otherwise). Add this
exiting code also to the case where the nodes are connected back-2-back.
2. In osm_vendor_ibumad.c - add a static variable to avoid trying to
open /dev/inifiniband/issm%d file twice during the run of opensm.
Thanks,
Yael
Signed-off-by: Yael Kalka <yael at mellanox.co.il>
Index: libvendor/osm_vendor_ibumad.c
===================================================================
--- libvendor/osm_vendor_ibumad.c (revision 4951)
+++ libvendor/osm_vendor_ibumad.c (working copy)
@@ -1142,8 +1142,11 @@ osm_vendor_set_sm(
osm_umad_bind_info_t *p_bind = (osm_umad_bind_info_t *)h_bind;
osm_vendor_t *p_vend = p_bind->p_vend;
char issmstring[24];
+ static boolean_t osm_vendor_set_sm_indicator = FALSE;
OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm );
+ if (is_sm_val == FALSE || osm_vendor_set_sm_indicator == FALSE)
+ {
sprintf(issmstring, "/dev/infiniband/issm%d", p_vend->umad_port_id);
if (TRUE == is_sm_val) {
p_vend->issmfd = open(issmstring, 0);
@@ -1162,6 +1165,15 @@ osm_vendor_set_sm(
" mask failed: errno %d\n", errno);
p_vend->issmfd = -1;
}
+ if ( osm_vendor_set_sm_indicator == FALSE )
+ osm_vendor_set_sm_indicator = TRUE;
+ }
+ else
+ {
+ osm_log(p_vend->p_log, OSM_LOG_ERROR,
+ "osm_vendor_set_sm: ERR 5436: "
+ "Trying to set IS_SM capability mask again\n");
+ }
OSM_LOG_EXIT( p_vend->p_log );
}
Index: opensm/osm_node_info_rcv.c
===================================================================
--- opensm/osm_node_info_rcv.c (revision 4951)
+++ opensm/osm_node_info_rcv.c (working copy)
@@ -229,6 +229,14 @@ __osm_ni_rcv_set_links(
osm_dump_dr_path(p_rcv->p_log,
osm_physp_get_dr_path_ptr(p_physp),
OSM_LOG_ERROR);
+
+ osm_log( p_rcv->p_log, OSM_LOG_SYS,
+ "Errors on subnet. Duplicate GUID found "
+ "by link from a port to itself. "
+ "See osm log for more details\n");
+
+ if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE )
+ exit( 1 );
}
else
{
More information about the general
mailing list