[openib-general] Re: [PATCH] Opensm - duplicated guids handling
Hal Rosenstock
halr at voltaire.com
Thu Jan 19 06:51:13 PST 2006
Hi Yael,
On Thu, 2006-01-19 at 07:08, Yael Kalka wrote:
> Hi Hal,
>
> We've noticed that currently if we have 2 hcas with duplicated guids
I renew my comment about duplicated GUIDs. This is a pretty fundamental
thing that MUST not be violated per the IBA spec. I understand there are
processes in place that make the duplication more error prone than it
should be.
If we go down this path, there are other things that fall into this
category and I believe this to be a slippery slope.
I am still willing to go ahead with this patch or some variant of it.
Some questions embedded in the patch.
> connected back-2-back, opensm gets stuck.
Not sure I quite understand the configuration. Are the two HCAs with the
duplicated guids in the same machine and connected back to back ? Is
that the case you are referring to ?
> The reason for that is that
> in osm_vendor_set_sm() function - the second call trying to open the
> /dev/infiniband/issm%id is stuck, since this file is already open.
> The following patch fixes 2 things -
> 1. In osm_node_info_rcv.c - we've added a case that on cases of
> duplicated guids - exit (unless a flag is set otherwise). Add this
> exiting code also to the case where the nodes are connected back-2-back.
> 2. In osm_vendor_ibumad.c - add a static variable to avoid trying to
> open /dev/inifiniband/issm%d file twice during the run of opensm.
The problem is that the second open hangs, right ? So rather than the
changes to osm_vendor_ibumad.c below change the flags on the open from 0
to O_NONBLOCK ? Does that work for you ?
If so, I will commit that approach with the change below to
osm_node_info_rcv.c. Please let me know. Thanks.
-- Hal
> Signed-off-by: Yael Kalka <yael at mellanox.co.il>
>
> Index: libvendor/osm_vendor_ibumad.c
> ===================================================================
> --- libvendor/osm_vendor_ibumad.c (revision 4951)
> +++ libvendor/osm_vendor_ibumad.c (working copy)
> @@ -1142,8 +1142,11 @@ osm_vendor_set_sm(
> osm_umad_bind_info_t *p_bind = (osm_umad_bind_info_t *)h_bind;
> osm_vendor_t *p_vend = p_bind->p_vend;
> char issmstring[24];
> + static boolean_t osm_vendor_set_sm_indicator = FALSE;
>
> OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm );
> + if (is_sm_val == FALSE || osm_vendor_set_sm_indicator == FALSE)
I may have a comment on this based on the answer to the below.
> + {
> sprintf(issmstring, "/dev/infiniband/issm%d", p_vend->umad_port_id);
> if (TRUE == is_sm_val) {
> p_vend->issmfd = open(issmstring, 0);
> @@ -1162,6 +1165,15 @@ osm_vendor_set_sm(
> " mask failed: errno %d\n", errno);
> p_vend->issmfd = -1;
> }
> + if ( osm_vendor_set_sm_indicator == FALSE )
> + osm_vendor_set_sm_indicator = TRUE;
> + }
> + else
> + {
> + osm_log(p_vend->p_log, OSM_LOG_ERROR,
> + "osm_vendor_set_sm: ERR 5436: "
> + "Trying to set IS_SM capability mask again\n");
> + }
> OSM_LOG_EXIT( p_vend->p_log );
> }
Does osm_vendor_set_sm_indicator ever needs to be reset to FALSE ?
> Index: opensm/osm_node_info_rcv.c
> ===================================================================
> --- opensm/osm_node_info_rcv.c (revision 4951)
> +++ opensm/osm_node_info_rcv.c (working copy)
> @@ -229,6 +229,14 @@ __osm_ni_rcv_set_links(
> osm_dump_dr_path(p_rcv->p_log,
> osm_physp_get_dr_path_ptr(p_physp),
> OSM_LOG_ERROR);
> +
> + osm_log( p_rcv->p_log, OSM_LOG_SYS,
> + "Errors on subnet. Duplicate GUID found "
> + "by link from a port to itself. "
> + "See osm log for more details\n");
> +
> + if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE )
> + exit( 1 );
> }
> else
> {
>
More information about the general
mailing list