[PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)

Ira Weiny weiny2 at llnl.gov
Thu Apr 24 18:16:57 PDT 2008


On Fri, 25 Apr 2008 01:23:56 +0300
"Or Gerlitz" <or.gerlitz at gmail.com> wrote:

> On 4/25/08, Ira Weiny <weiny2 at llnl.gov> wrote:
> >
> > When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast
> > group.
> 
> 
> bad!
> 
> Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based
> > kernel, and OpenSM 3.2.1-8341058-dirty.
> 
> 
> and what is the hca device and fw version at the nodes? maybe you send the
> list ipoib (debug_level=1 && multicast_debug_level=1)  debug output?
> 

I did not get any output with multicast_debug_level!  But I added some more
debugging and finally realized that the set was not being sent.  :-(  I put a
debug statement in OpenSM where the flag was set and therefore thought that
OpenSM had set the rereg bit.  However, since no other data had changed the
"set" MAD was not sent.  (I am getting a bit tongue tied reading this back.  I
hope that all makes sense.)

Here is a patch which fixes the problem.  (At least with the partial sub-nets
configuration I explained before.)  I will have to verify this fixes the problem
I originally reported.

Ira


>From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Thu, 24 Apr 2008 18:05:01 -0700
Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/opensm/osm_lid_mgr.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c
index ab23929..4d628d2 100644
--- a/opensm/opensm/osm_lid_mgr.c
+++ b/opensm/opensm/osm_lid_mgr.c
@@ -1099,9 +1099,14 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr,
 	if ((p_mgr->p_subn->first_time_master_sweep == TRUE || p_port->is_new)
 	    && !p_mgr->p_subn->opt.no_clients_rereg
 	    && ((p_old_pi->capability_mask & IB_PORT_CAP_HAS_CLIENT_REREG) !=
-		0))
+		0)) {
+		OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG,
+			"Seting client rereg on %s, port %d\n",
+			p_port->p_node->print_desc,
+			p_port->p_physp->port_num);
 		ib_port_info_set_client_rereg(p_pi, 1);
-	else
+		send_set = TRUE;
+	} else
 		ib_port_info_set_client_rereg(p_pi, 0);
 
 	/* We need to send the PortInfo Set request with the new sm_lid
-- 
1.5.1





More information about the general mailing list