[openib-general] opensm issue

Hal Rosenstock halr at voltaire.com
Mon Feb 26 13:25:28 PST 2007


Hi Ashish,

On Mon, 2007-02-26 at 16:04, Batwara, Ashish wrote:
> Hi,
> I am trying to bring up opensm, but it not letting me. When I look at
> the /var/log/messages, I see that it becomes UP for a moment and then
> again it goes down. Look for " SUBNET UP  " in below logs. Can anyone
> know what the problem is? I am using OFED-1.1.1 with patches almost 1
> month ago.
> 
> Thanks
> Ashish
> 
> 
> Feb 26 14:38:37 p49 run_srp_daemon[7640]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:38:37 p49 run_srp_daemon[7642]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:38:46 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:38:53 p49 run_srp_daemon[7653]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:38:53 p49 run_srp_daemon[7658]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:38:56 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:38:56 p49 run_srp_daemon[7675]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:38:56 p49 run_srp_daemon[7680]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:39:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:39:26 p49 last message repeated 2 times
> Feb 26 14:39:26 p49 run_srp_daemon[7691]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:39:26 p49 run_srp_daemon[7692]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:39:29 p49 run_srp_daemon[7715]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:39:29 p49 run_srp_daemon[7716]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:39:36 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:39:56 p49 last message repeated 2 times
> Feb 26 14:39:59 p49 run_srp_daemon[7728]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:39:59 p49 run_srp_daemon[7727]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:40:02 p49 run_srp_daemon[7752]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:40:02 p49 run_srp_daemon[7751]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:40:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:26 p49 last message repeated 2 times
> Feb 26 14:40:32 p49 run_srp_daemon[7791]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:40:32 p49 run_srp_daemon[7792]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:40:35 p49 run_srp_daemon[7812]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:40:35 p49 run_srp_daemon[7817]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:40:36 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:46 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:56 p49 OpenSM[7433]: Entering MASTER state  
> Feb 26 14:40:56 p49 OpenSM[7433]: SUBNET UP  
> Feb 26 14:41:05 p49 run_srp_daemon[7823]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:41:05 p49 run_srp_daemon[7832]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:41:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:41:08 p49 run_srp_daemon[7847]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:41:14 p49 run_srp_daemon[7853]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:41:16 p49 OpenSM[7433]: SM port is down  

It appears your SM port to some switch (?) is losing physical
connectivity. Try a different (known good) cable.

-- Hal

> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 





More information about the general mailing list