SPAM Re: [ofa-general] OpenSM Problems/Questions

Tue Sep 9 11:35:48 PDT 2008

Hi,

On Tue, Sep 9, 2008 at 9:01 AM, Matthew Trzyna <trzyna at us.ibm.com> wrote:
> Hello
>
>
> A "Basic Fabric Diagram" at the end.
>
>
> I am working with a customer implementing a large IB fabric and is
> encountering problems with OpenSM (OFED 1.3) when they added a new 264 node
> cluster (with its own 288 port IB switch) to their existing cluster. Two
> more 264 clusters are planned to be added in the near future. They recently
> moved to SLES 10 SP1 and OFED 1.3 (before adding the new cluster) and had
> not been experiencing these problems before.
>
> Could you help provide answers to the questions listed below? Additional
> information about the configuration including a basic fabric diagram are
> provided after the questions.
>
> What parameters should be set on the non-SM nodes that affect how the Subnet
> Administrator functions?
> What parameters should be set on the SM node(s) that affect how the Subnet
> Administrator functions? And, what  parameters should be removed from the SM
> node(s)? (ie.  ib_sa paths_per_dest=0x7f)
> How should SM failover be setup? How many failover SM's should be
> configured? This must happen quickly and transparently or GPFS will die
> everywhere due to timeouts if this takes too long).

What is quickly enough ?

> Are there SA (Subnet Administrator) commands that should not be executed on
> a large "live" fabric?            (ie. "saquery -p")
> Should GPFS be configured "off" on the SM node(s)?
> Do you know of any other OpenSM implementations that have 5 (or more) 288
> port IB switches that might have already encountered/resolved some of these
> issues?

There are some deployments with multiple large switches deployed.

Not sure what you mean by issues; I see questions above.

> The following problem that is being encountered may also be SA/SM related. A
> node (NodeX) may be seen (through IPoIB) by all but a few nodes (NodesA-G).
> A ping from those node (NodesA-G) to NodeX returns "Destination Host
> Unreachable". A ping from NodeX to NodesA-G works.

Sounds like perhaps those nodes were unable to join the broadcast
group perhaps due to a rate issue.

-- Hal

> --------------------------------------------------------------------------------------------------
>
> System Information
>
> Here is the current opensm.conf file: (See attached file: opensm.conf)
>
> It is the default configuration from the OFED 1.3 build with "priority"
> added at the bottom. Note that the /etc/init.d/opensmd sources
> /etc/sysconfig/opensm not etc/sysconfig/opensm.conf (opensm.conf was just
> copied to opensm). There are a couple of "proposed" settings that are
> commented out, that were found them on the web.
>
> Following are the present settings that may affect the Fabric:
>
> /etc/infiniband/openib.conf
> SET_IPOIB_CM=no
>
> /etc/modprobe.conf.local
> options ib_ipoib send_queue_size=512 recv_queue_size=512
> options ib_sa paths_per_dest=0x7f
>
> /etc/sysctl.conf
> net.ipv4.neigh.ib0.base_reachable_time = 1200
> net.ipv4.neigh.default.gc_thresh3 = 3072
> net.ipv4.neigh.default.gc_thresh2 = 2500
> net.ipv4.neigh.default.gc_thresh1 = 2048
>
> /etc/sysconfig/opensm
> All defaults as supplied with OFED 1.3 OpenSM
>
>
> -------------------------------------------------------
>
>
>                    Basic Fabric Diagram
>
>                     +----------+
>                     |Top Level |-------------------+ 20 IO nodes
>   +-----------------| 288 port |----------------+    16 Viual nodes
>   |                 |  IB Sw   |------------+   |     2 Admin nodes
>   |          +------|          |---+        |   |       (SM nodes)
>   |          |      +----------+   |        |   |     4 Support nodes
>   |          |          |          |        |   |
>   |          |          |          |        |   |
>  24         24         24         24       24  24 <--uplinks
>   |          |          |          |        |   |
>   |          |          |          |        |   +------+
>   |          |          |          |        |          |
>   |(BASE)    |(SCU1)    |(SCU2)    |(SCU3)  |(SCU4)    |(SCU5)
> +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
> |288-port| |288-port| |288-port| |288-port| |288-port| |288-port|
> | IB Sw  | | IB Sw  | | IB Sw  | |  IB Sw | |  IB Sw | |  IB Sw |
> +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
> 140-nodes 264-nodes  264-nodes  264-nodes  264-nodes  264-nodes
> WhiteBox    Dell       Dell       IBM        IBM      IBM (future)
>
> NOTE: SCU4 is not currently connected to the Top Level Switch.
>      We'd like to address these issues before making that connection.
>
>      Subnet Managers are configured on nodes connected to the
>      Top Leval Switch.
>
> Let me know if you need any more information.
>
> Any help you could provide would be most appreciated.
>
> Thanks.
>
> Matt Trzyna
> IBM Linux Cluster Enablement
> 3039 Cornwallis Rd.
> RTP, NC 27709
> e-mail: trzyna at us.ibm.com
> Office: (919) 254-9917 Tie Line: 444
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>

***SPAM*** Re: [ofa-general] OpenSM Problems/Questions

SPAM Re: [ofa-general] OpenSM Problems/Questions