[ewg] Huge size for opensm process (5.7GB)

Sandeep Dhavale Sandeep_Dhavale at symantec.com
Mon Sep 30 21:56:54 PDT 2013


Thanks Doug! I will try that and update the thread with the findings!

Regards,
Sandeep.

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Doug Ledford
Sent: Monday, September 30, 2013 11:26 AM
To: ewg at lists.openfabrics.org
Subject: Re: [ewg] Huge size for opensm process (5.7GB)

On 09/26/13 15:28, Sandeep Dhavale wrote:
> Hello,
> 
> I have a setup where 2 nodes are connected back to back. So my subnet 
> is basically 2 nodes.
> 
> This is RHEL6.3 setup and opensm running is
> 
>  
> 
> [root at intel-eva1 ~]# rpm -qi opensm
> 
> Name        : opensm                       Relocations: (not relocatable)
> 
> Version     : 3.3.13                            Vendor: Red Hat, Inc.
> 
> Release     : 1.el6                         Build Date: Tue 28 Feb 2012
> 07:53:06 PM PST
> 
> Install Date: Thu 04 Jul 2013 01:44:25 AM PDT      Build Host:
> x86-003.build.bos.redhat.com
> 
> Group       : System Environment/Daemons    Source RPM:
> opensm-3.3.13-1.el6.src.rpm
> 
> Size        : 1317469                          License: GPLv2 or BSD
> 
> Signature   : RSA/8, Wed 30 May 2012 11:14:47 AM PDT, Key ID
> 199e2f91fd431d51
> 
> Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
> 
> URL         : http://www.openfabrics.org/
> 
> Summary     : OpenIB InfiniBand Subnet Manager and management utilities
> 
>  
> 
> The opensm process has grown up and is now 5.7GB. I do not think this 
> is normal for a subnet of size 2 nodes.
> 
>  
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  DATA COMMAND
> 
> 15306 root      20   0 5851m 2568  692 S  0.0  0.0   0:06.04 5.7g opensm
> 
>  
> 
> [root at intel-eva1 ~]# ps -ef | grep  opensm
> 
> root     15306     1  0 01:04 ?        00:00:06 /usr/sbin/opensm -B -F
> /etc/rdma/opensm.conf.[0-9]*
> 
>  
> 
> One thing to notice is the /var/log/opensm.log is flooded with 
> messages like below every 10 second:
> 
>  
> 
> Sep 26 12:21:58 384194 [E2BA2700] 0x01 -> osm_prtn_make_partitions:
> Partition configuration /etc/rdma/partitions.conf is not accessible 
> (No such file or directory)
> 
> Sep 26 12:21:58 385098 [E2BA2700] 0x02 -> SUBNET UP
> 
> Sep 26 12:22:08 384306 [E2BA2700] 0x01 -> osm_prtn_make_partitions:
> Partition configuration /etc/rdma/partitions.conf is not accessible 
> (No such file or directory)
> 
> Sep 26 12:22:08 385189 [E2BA2700] 0x02 -> SUBNET UP
> 
>  
> 
> Can anybody put some light on what might be wrong? Does opensm 
> maintain the log in memory as well? I haven't put a limit on the size 
> of the log in /etc/rdma/opensm.conf.

You're hitting a bug in the opensm startup script on that release.  It's trying to run with a non-existent config file (this was fixed by adding shopt -s nullglob to the startup script).  I would update to the later opensm from rhel6.4 where this is no longer an issue.


--
Doug Ledford <dledford at redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



More information about the ewg mailing list