[ofa-general] [PATCH v2] opensm/osm_node_info_rcv.c: create physp for the newly discovered port of the known node

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Feb 26 04:22:14 PST 2009


Hi Sasha,

[v2: adding CL_ASSERT() and changing comments]

This patch fixes bugzilla issue #1515.

The bug was discovered and analyzed by Line Holen.

Topology:
                 |---------------|
                 |      SW2      |
                 |---------------|
                   |x |y    |z |v
              |----|  |     |  |----|
              |       |     |       |
              |  |----|     |----|  |
              |  |               |  |
             a| b|              c| d|
      |---------------|     |---------------|
      |       SW1     |     |     SW3       |
      |---------------|     |---------------|
          |                             |
          |                             |
       HCA with SM                      HCA

During the discovery:

SM sends NodeInfo request to SW1
SM sends NodeInfo request to SW2 through link a->x
SM discovers new node SW2:
  - updates DR to SW2 to go through link a->x
  - creates physp x
SM sends NodeInfo request to SW2 through link b->y
SM discovers a known node SW2
  - DOES NOT create physp y
  - updates DR to SW2 to go through link b->y

>From now on, the DR to SW2 is going through port y, so OpenSM won't deal with
port y any more, leaving it uninitialized (no physp object for this port).

The fix is to create physp for the newly discovered port of the known
switch node, same way as it is done for HCAs.
I also added one log message for the case that showed the problem - when
one of the link sides is uninitialized (no valid ports check). Perhaps
this log message should be an error message instead?

Debugged-by: Line Holen <Line.Holen at Sun.COM>
Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

---
 opensm/opensm/osm_node_info_rcv.c |   35 ++++++++++++++++++++++++++---------
 1 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
index c52c0d5..4d3724c 100644
--- a/opensm/opensm/osm_node_info_rcv.c
+++ b/opensm/opensm/osm_node_info_rcv.c
@@ -154,18 +154,17 @@ __osm_ni_rcv_set_links(IN osm_sm_t * sm,
 		goto _exit;
 	}

-	/*
-	   We have seen this neighbor node before, but we might
-	   not have seen this port on the neighbor node before.
-	   We should not set links to an uninitialized port on the
-	   neighbor, so check validity up front.  If it's not
-	   valid, do nothing, since we'll see this link again
-	   when we probe the neighbor.
-	 */
+	/* When setting the link, ports on both
+	   sides of the link should be initialized */
 	if (!osm_node_link_has_valid_ports(p_node, port_num,
 					   p_neighbor_node,
-					   p_ni_context->port_num))
+					   p_ni_context->port_num)) {
+		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+			"Link at node 0x%" PRIx64 ", port %u - no valid ports\n",
+			cl_ntoh64(osm_node_get_node_guid(p_node)), port_num);
+		CL_ASSERT(0);
 		goto _exit;
+	}

 	if (osm_node_link_exists(p_node, port_num,
 				 p_neighbor_node, p_ni_context->port_num)) {
@@ -537,8 +536,26 @@ __osm_ni_rcv_process_existing_switch(IN osm_sm_t * sm,
 				     IN osm_node_t * const p_node,
 				     IN const osm_madw_t * const p_madw)
 {
+
+	ib_smp_t *p_smp;
+	ib_node_info_t *p_ni;
+	uint8_t port_num;
+
 	OSM_LOG_ENTER(sm->p_log);

+	p_smp = osm_madw_get_smp_ptr(p_madw);
+	p_ni = (ib_node_info_t *) ib_smp_get_payload_ptr(p_smp);
+	port_num = ib_node_info_get_local_port_num(p_ni);
+
+	if (!osm_node_get_physp_ptr(p_node, port_num)) {
+		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+			"Creating physp for node GUID:0x%"
+			PRIx64 ", port %u\n",
+			cl_ntoh64(osm_node_get_node_guid(p_node)),
+			port_num);
+		osm_node_init_physp(p_node, p_madw);
+	}
+
 	/*
 	   If this switch has already been probed during this sweep,
 	   then don't bother reprobing it.
-- 
1.5.1.4





More information about the general mailing list