[Users] Weird IPoIB issue

Hal Rosenstock hal.rosenstock at gmail.com
Fri Nov 15 10:35:05 PST 2013


It just means their version is based on 3.3.5 which is really old and
moldy. They've made a few changes internally. If they say you can try a
stock OpenSM, go for it and hopefully things will work properly. You can
even try with this config file. The transaction timeout is lengthed from
200 msec to 2 sec.

I do not recall what version started supporting use_mfttop. I don't have
the time to figure that out. But I doubt that it does as it doesn't show up
in the config file unless I missed it. You can just try it and see if your
results change (for the better). The opensm log will show a complaint about
an unknown config option if it's not supported.

Note that once a switch has MulticastFDBTop set to 0xbfff if OpenSM is set
not to use_mfttop, I don't think it resets it to 0 but rather ignores it.
The switches need to be reset so this field is 0. Start the process with
that and verify using smpquery si on all your switches.

-- Hal


On Fri, Nov 15, 2013 at 12:49 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:

> Hal,
>
> From what I can tell, the start up script that starts opensm in Xsigo only
> specifies the following command line parameters:
>
> "-t 2000 -L 100 -y -q loopback -P /tmp/osmpart.conf -F
> /opt/xsigo/xsigos/current/ofed/etc/opensm.opts"
>
> The opensm.opts contains:
> # SA database file name
> sa_db_file /var/log/opensm-sa.dump
>
> # If TRUE causes OpenSM to dump SA database at the end of
> # every light sweep, regardless of the verbosity level
> sa_db_dump TRUE
>
> # The directory to hold the file OpenSM dumps
> dump_files_dir /var/log/
>
> And the osmpart.conf contains:
>     Default=0x7fff,ipoib: ALL=full ;
>
> They are running OpenSM 3.3.5 so it seems that it is pretty vanilla.
> However, I know that we set the priority of the SMs in their management
> tool, so I'm wondering if they are passing some additional parameters
> through the loopback interface. I guess they could have patched the OpenSM
> code, but I'm not sure they have done that.
>
> I logged into the opensm console and dumped the config. Disable multicast
> is set to false. It looks like MulticastFDBTop was implemented back in
> 2009, so this version should support it. Can I set use_mfttop using this
> version, if not do you know what version I can?
>
> In my testing with ibsim, the LIDs between the real environment and
> simulated environment appeared to be the same as well as the routing, so I
> don't believe that I'd run into a problem moving to OpenSM as a primary SM.
> Do you see anything in the running config that would be concerning to you
> that should be configured with OpenSM? The differences that I see that I
> think may drastically change the network behavior are transaction_timeout
> and babbling_port_policy, but I'm not 100% sure.
>
> OpenSM $ dump_conf
> #
> # DEVICE ATTRIBUTES OPTIONS
> #
> # The port GUID on which the OpenSM is running
> guid 0x0000000000000000
>
> # M_Key value sent to all ports qualifying all Set(PortInfo)
> m_key 0x0000000000000000
>
> # The lease period used for the M_Key on this subnet in [sec]
> m_key_lease_period 0
>
> # SM_Key value of the SM used for SM authentication
> sm_key 0x0000000000000001
>
> # SM_Key value to qualify rcv SA queries as 'trusted'
> sa_key 0x0000000000000001
>
> # Note that for both values above (sm_key and sa_key)
> # OpenSM version 3.2.1 and below used the default value '1'
> # in a host byte order, it is fixed now but you may need to
> # change the values to interoperate with old OpenSM running
> # on a little endian machine.
>
> # Subnet prefix used on this subnet
> subnet_prefix 0xfe80000000000000
>
> # The LMC value used on this subnet
> lmc 0
>
> # lmc_esp0 determines whether LMC value used on subnet is used for
> # enhanced switch port 0. If TRUE, LMC value for subnet is used for
> # ESP0. Otherwise, LMC value for ESP0s is 0.
> lmc_esp0 FALSE
>
> # sm_sl determines SMSL used for SM/SA communication
> sm_sl 0
>
> # The code of maximal time a packet can live in a switch
> # The actual time is 4.096usec * 2^<packet_life_time>
> # The value 0x14 disables this mechanism
> packet_life_time 0x12
>
> # The number of sequential packets dropped that cause the port
> # to enter the VLStalled state. The result of setting this value to
> # zero is undefined.
> vl_stall_count 0x07
>
> # The number of sequential packets dropped that cause the port
> # to enter the VLStalled state. This value is for switch ports
> # driving a CA or router port. The result of setting this value
> # to zero is undefined.
> leaf_vl_stall_count 0x07
>
> # The code of maximal time a packet can wait at the head of
> # transmission queue.
> # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
> # The value 0x14 disables this mechanism
> head_of_queue_lifetime 0x12
>
> # The maximal time a packet can wait at the head of queue on
> # switch port connected to a CA or router port
> leaf_head_of_queue_lifetime 0x10
>
> # Limit the maximal operational VLs
> max_op_vls 5
>
> # Force PortInfo:LinkSpeedEnabled on switch ports
> # If 0, don't modify PortInfo:LinkSpeedEnabled on switch port
> # Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port
> # Values are (IB Spec 1.2.1, 14.2.5.6 Table 146 "PortInfo")
> #    1: 2.5 Gbps
> #    3: 2.5 or 5.0 Gbps
> #    5: 2.5 or 10.0 Gbps
> #    7: 2.5 or 5.0 or 10.0 Gbps
> #    2,4,6,8-14 Reserved
> #    Default 15: set to PortInfo:LinkSpeedSupported
> force_link_speed 15
>
> # The subnet_timeout code that will be set for all the ports
> # The actual timeout is 4.096usec * 2^<subnet_timeout>
> subnet_timeout 18
>
> # Threshold of local phy errors for sending Trap 129
> local_phy_errors_threshold 0x08
>
> # Threshold of credit overrun errors for sending Trap 130
> overrun_errors_threshold 0x08
>
> #
> # PARTITIONING OPTIONS
> #
> # Partition configuration file to be used
> partition_config_file /tmp/osmpart.conf
>
> # Disable partition enforcement by switches
> no_partition_enforcement FALSE
>
> #
> # SWEEP OPTIONS
> #
> # The number of seconds between subnet sweeps (0 disables it)
> sweep_interval 10
>
> # If TRUE cause all lids to be reassigned
> reassign_lids FALSE
>
> # If TRUE forces every sweep to be a heavy sweep
> force_heavy_sweep FALSE
>
> # If TRUE every trap will cause a heavy sweep.
> # NOTE: successive identical traps (>10) are suppressed
> sweep_on_trap TRUE
>
> #
> # ROUTING OPTIONS
> #
> # If TRUE count switches as link subscriptions
> port_profile_switch_nodes FALSE
>
> # Name of file with port guids to be ignored by port profiling
> port_prof_ignore_file (null)
>
> # The file holding routing weighting factors per output port
> hop_weights_file (null)
>
> # Routing engine
> # Multiple routing engines can be specified separated by
> # commas so that specific ordering of routing algorithms will
> # be tried if earlier routing engines fail.
> # Supported engines: minhop, updn, file, ftree, lash, dor
> routing_engine (null)
>
> # Connect roots (use FALSE if unsure)
> connect_roots FALSE
>
> # Use unicast routing cache (use FALSE if unsure)
> use_ucast_cache FALSE
>
> # Lid matrix dump file name
> lid_matrix_dump_file (null)
>
> # LFTs file name
> lfts_file (null)
>
> # The file holding the root node guids (for fat-tree or Up/Down)
> # One guid in each line
> root_guid_file (null)
>
> # The file holding the fat-tree compute node guids
> # One guid in each line
> cn_guid_file (null)
>
> # The file holding the fat-tree I/O node guids
> # One guid in each line
> io_guid_file (null)
>
> # Number of reverse hops allowed for I/O nodes
> # Used for connectivity between I/O nodes connected to Top Switches
> max_reverse_hops 0
>
> # The file holding the node ids which will be used by Up/Down algorithm
> instead
> # of GUIDs (one guid and id in each line)
> ids_guid_file (null)
>
> # The file holding guid routing order guids (for MinHop and Up/Down)
> guid_routing_order_file (null)
>
> # Do mesh topology analysis (for LASH algorithm)
> do_mesh_analysis FALSE
>
> # Starting VL for LASH algorithm
> lash_start_vl 0
>
> # SA database file name
> sa_db_file /var/log/opensm-sa.dump
>
> # If TRUE causes OpenSM to dump SA database at the end of
> # every light sweep, regardless of the verbosity level
> sa_db_dump TRUE
>
> #
> # HANDOVER - MULTIPLE SMs OPTIONS
> #
> # SM priority used for deciding who is the master
> # Range goes from 0 (lowest priority) to 15 (highest).
> sm_priority 5
>
> # If TRUE other SMs on the subnet should be ignored
> ignore_other_sm FALSE
>
> # Timeout in [msec] between two polls of active master SM
> sminfo_polling_timeout 10000
>
> # Number of failing polls of remote SM that declares it dead
> polling_retry_number 4
>
> # If TRUE honor the guid2lid file when coming out of standby
> # state, if such file exists and is valid
> honor_guid2lid_file FALSE
>
> #
> # TIMING AND THREADING OPTIONS
> #
> # Maximum number of SMPs sent in parallel
> max_wire_smps 4
>
> # The maximum time in [msec] allowed for a transaction to complete
> transaction_timeout 2000
>
> # The maximum number of retries allowed for a transaction to complete
> transaction_retries 3
>
> # Maximal time in [msec] a message can stay in the incoming message queue.
> # If there is more than one message in the queue and the last message
> # stayed in the queue more than this value, any SA request will be
> # immediately returned with a BUSY status.
> max_msg_fifo_timeout 10000
>
> # Use a single thread for handling SA queries
> single_thread FALSE
>
> #
> # MISC OPTIONS
> #
> # Daemon mode
> daemon FALSE
>
> # SM Inactive
> sm_inactive FALSE
>
> # Babbling Port Policy
> babbling_port_policy FALSE
>
> #
> # Performance Manager Options
> #
> # perfmgr enable
> perfmgr FALSE
>
> # perfmgr redirection enable
> perfmgr_redir TRUE
>
> # sweep time in seconds
> perfmgr_sweep_time_s 180
>
> # Max outstanding queries
> perfmgr_max_outstanding_queries 500
>
> #
> # Event DB Options
> #
> # Dump file to dump the events to
> event_db_dump_file (null)
>
> #
> # Event Plugin Options
> #
> event_plugin_name (null)
>
> #
> # Node name map for mapping node's to more descriptive node descriptions
> # (man ibnetdiscover for more information)
> #
> node_name_map_name (null)
>
> #
> # DEBUG FEATURES
> #
> # The log flags used
> log_flags 0x03
>
> # Force flush of the log file after each log message
> force_log_flush FALSE
>
> # Log file to be used
> log_file /var/log/opensm.log
>
> # Limit the size of the log file in MB. If overrun, log is restarted
> log_max_size 100
>
> # If TRUE will accumulate the log over multiple OpenSM sessions
> accum_log_file TRUE
>
> # The directory to hold the file OpenSM dumps
> dump_files_dir /var/log/
>
> # If TRUE enables new high risk options and hardware specific quirks
> enable_quirks FALSE
>
> # If TRUE disables client reregistration
> no_clients_rereg FALSE
>
> # If TRUE OpenSM should disable multicast support and
> # no multicast routing is performed if TRUE
> disable_multicast FALSE
>
> # If TRUE opensm will exit on fatal initialization issues
> exit_on_fatal FALSE
>
> # console [off|local|loopback|socket]
> console loopback
>
> # Telnet port for console (default 10000)
> console_port 10000
>
> #
> # QoS OPTIONS
> #
> # Enable QoS setup
> qos FALSE
>
> # QoS policy file to be used
> qos_policy_file /opt/xsigo/xsigos/current/ofed/etc/opensm/qos-policy.conf
>
> # QoS default options
> qos_max_vls 0
> qos_high_limit -1
> qos_vlarb_high (null)
> qos_vlarb_low (null)
> qos_sl2vl (null)
>
> # QoS CA options
> qos_ca_max_vls 0
> qos_ca_high_limit -1
> qos_ca_vlarb_high (null)
> qos_ca_vlarb_low (null)
> qos_ca_sl2vl (null)
>
> # QoS Switch Port 0 options
> qos_sw0_max_vls 0
> qos_sw0_high_limit -1
> qos_sw0_vlarb_high (null)
> qos_sw0_vlarb_low (null)
> qos_sw0_sl2vl (null)
>
> # QoS Switch external ports options
> qos_swe_max_vls 0
> qos_swe_high_limit -1
> qos_swe_vlarb_high (null)
> qos_swe_vlarb_low (null)
> qos_swe_sl2vl (null)
>
> # QoS Router ports options
> qos_rtr_max_vls 0
> qos_rtr_high_limit -1
> qos_rtr_vlarb_high (null)
> qos_rtr_vlarb_low (null)
> qos_rtr_sl2vl (null)
>
> # Prefix routes file name
> prefix_routes_file
> /opt/xsigo/xsigos/current/ofed/etc/opensm/prefix-routes.conf
>
> #
> # IPv6 Solicited Node Multicast (SNM) Options
> #
> consolidate_ipv6_snm_req FALSE
>
> OpenSM $
>
> Thanks again for all your help.
>
>
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
>
>
> On Wed, Nov 13, 2013 at 12:27 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:
>
>> They told me in the past that we could use our own external subnet
>> manager or the one built into their box.
>>
>>
>> Robert LeBlanc
>> OIT Infrastructure & Virtualization Engineer
>> Brigham Young University
>>
>>
>> On Wed, Nov 13, 2013 at 12:25 PM, Hal Rosenstock <
>> hal.rosenstock at gmail.com> wrote:
>>
>>> Yes but I'm not sure what the Xsigo SM "special sauce" is so those boxes
>>> may not function properly.
>>>
>>>
>>> On Wed, Nov 13, 2013 at 2:13 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:
>>>
>>>> The front line Oracle tech is giving me some hog wash that it is a
>>>> problem with Dell and Mellanox and that the subnet manager is not at fault
>>>> (although they are passing the request to engineering). I think I'm just
>>>> going to run OpenSM on this test node (after reducing the priority on the
>>>> Oracle sm) and see if the problem clears up.
>>>>
>>>>
>>>> Robert LeBlanc
>>>> OIT Infrastructure & Virtualization Engineer
>>>> Brigham Young University
>>>>
>>>>
>>>> On Wed, Nov 13, 2013 at 12:08 PM, Hal Rosenstock <
>>>> hal.rosenstock at gmail.com> wrote:
>>>>
>>>>> Yes, IPoIB uses multicast groups for the IP broadcast group and any IP
>>>>> multicast groups. You can see those with saquery -g. But depending on the
>>>>> locations of the ports running IPoIB and your topology, a multicast group
>>>>> may or may not be routed via a particular switch.
>>>>>
>>>>>
>>>>> On Wed, Nov 13, 2013 at 2:06 PM, Robert LeBlanc <
>>>>> robert_leblanc at byu.edu> wrote:
>>>>>
>>>>>> Ipoib uses multicast, right? I'm guessing that is why I can't get
>>>>>> ipoib to work on our blades but our rack servers can.
>>>>>>
>>>>>> Robert LeBlanc
>>>>>> Virtualization and Server Engineer
>>>>>> Brigham Young University
>>>>>>
>>>>>> Sent from a mobile device, please excuse any typos.
>>>>>> On Nov 13, 2013 12:03 PM, "Hal Rosenstock" <hal.rosenstock at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That should be fine. 7.4.3000 looks like the latest.
>>>>>>>
>>>>>>> This looks like an SM issue missetting that parameter in the switch
>>>>>>> assuming that there are some MC groups routed through that switch.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 13, 2013 at 1:55 PM, Robert LeBlanc <
>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>
>>>>>>>> [root at desxi003 ~]# flint -d
>>>>>>>> /dev/mst/SW_MT48438_0x2c90200448e28_lid-0x0034 q
>>>>>>>> Image type:      FS2
>>>>>>>> FW Version:      7.4.0
>>>>>>>> Device ID:       48438
>>>>>>>> Description:     Node             Sys image
>>>>>>>> GUIDs:           0002c90200448e28 0002c90200448e2b
>>>>>>>> Board ID:        n/a (DEL08D0110003)
>>>>>>>> VSD:             n/a
>>>>>>>> PSID:            DEL08D0110003
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Robert LeBlanc
>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>> Brigham Young University
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 13, 2013 at 11:52 AM, Hal Rosenstock <
>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> What's the latest firmware version ?
>>>>>>>>>
>>>>>>>>> Can you determine the firmware version of the switches ? vendstat
>>>>>>>>> -N <switch lid> might work to show this.
>>>>>>>>>
>>>>>>>>> This is important...
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> -- Hal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Nov 13, 2013 at 1:46 PM, Robert LeBlanc <
>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for all the help so far, this is a great community! I've
>>>>>>>>>> fed all this info back to Oracle and I'll have to see what they say.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Robert LeBlanc
>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>> Brigham Young University
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 13, 2013 at 11:40 AM, Hal Rosenstock <
>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, this is the cause of the issues.
>>>>>>>>>>>
>>>>>>>>>>> smpdump (and smpquery) merely query (get) and don't set
>>>>>>>>>>> parameters and anyhow, the SM would overwrite it when it thought it needed
>>>>>>>>>>> to update it. It's an SM and/or firmware issue.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 13, 2013 at 1:38 PM, Robert LeBlanc <
>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We are on the latest version of firmware for all of our
>>>>>>>>>>>> switches (as of last month). I guess I'll have to check with Oracle and see
>>>>>>>>>>>> if they are setting this parameter in their subnet manager. Just to
>>>>>>>>>>>> confirm, using smpdump (or similar) to change the value won't do any good
>>>>>>>>>>>> because the subnet manager will just change it back?
>>>>>>>>>>>>
>>>>>>>>>>>> I think this is the cause of the problems, now to get it fixed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Robert LeBlanc
>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 13, 2013 at 11:34 AM, Hal Rosenstock <
>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> In general, MulticastFDBTop should be 0 or some value above
>>>>>>>>>>>>> 0xc000.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Indicates the upper bound of the range of the multicast
>>>>>>>>>>>>>
>>>>>>>>>>>>> forwarding table. Packets received with MLIDs greater
>>>>>>>>>>>>>
>>>>>>>>>>>>> than MulticastFDBTop are considered to be outside the
>>>>>>>>>>>>>
>>>>>>>>>>>>> range of the Multicast Forwarding Table (see
>>>>>>>>>>>>>
>>>>>>>>>>>>> 18.2.4.3.3
>>>>>>>>>>>>>
>>>>>>>>>>>>> Required Multicast Relay on page 1072
>>>>>>>>>>>>>
>>>>>>>>>>>>> ). A valid MulticastFDBTop
>>>>>>>>>>>>>
>>>>>>>>>>>>> is less than MulticastFDBCap + 0xC000.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This component applies only to switches that implement
>>>>>>>>>>>>>
>>>>>>>>>>>>> the optional multicast forwarding service. A switch
>>>>>>>>>>>>>
>>>>>>>>>>>>> shall ignore the MulticastFDBTop component if it has
>>>>>>>>>>>>>
>>>>>>>>>>>>> the value zero. The initial value for MulticastFDBTop
>>>>>>>>>>>>>
>>>>>>>>>>>>> shall be set to zero. A value of 0xBFFF means there are
>>>>>>>>>>>>>
>>>>>>>>>>>>> no MulticastForwardingTable entries.
>>>>>>>>>>>>> It is set by OpenSM. There is a parameter to disable it's use
>>>>>>>>>>>>> (use_mfttop) which can be set to FALSE. This may depend on which OpenSM
>>>>>>>>>>>>> version you are running. In order to get out of this state, you may need to
>>>>>>>>>>>>> reset any switches which have this parameter set like this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea on the firmware versions in your various switches ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Nov 13, 2013 at 1:16 PM, Robert LeBlanc <
>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry to take so long, I've been busy with other things. Here
>>>>>>>>>>>>>> is the output:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root at desxi003 ~]# smpquery si 52
>>>>>>>>>>>>>> # Switch info: Lid 52
>>>>>>>>>>>>>> LinearFdbCap:....................49152
>>>>>>>>>>>>>> RandomFdbCap:....................0
>>>>>>>>>>>>>> McastFdbCap:.....................4096
>>>>>>>>>>>>>> LinearFdbTop:....................189
>>>>>>>>>>>>>> DefPort:.........................0
>>>>>>>>>>>>>> DefMcastPrimPort:................255
>>>>>>>>>>>>>> DefMcastNotPrimPort:.............255
>>>>>>>>>>>>>> LifeTime:........................18
>>>>>>>>>>>>>> StateChange:.....................0
>>>>>>>>>>>>>> OptSLtoVLMapping:................1
>>>>>>>>>>>>>> LidsPerPort:.....................0
>>>>>>>>>>>>>> PartEnforceCap:..................32
>>>>>>>>>>>>>> InboundPartEnf:..................1
>>>>>>>>>>>>>> OutboundPartEnf:.................1
>>>>>>>>>>>>>> FilterRawInbound:................1
>>>>>>>>>>>>>> FilterRawOutbound:...............1
>>>>>>>>>>>>>> EnhancedPort0:...................0
>>>>>>>>>>>>>> MulticastFDBTop:.................0xbfff
>>>>>>>>>>>>>> [root at desxi003 ~]# smpquery pi 52 0
>>>>>>>>>>>>>> # Port info: Lid 52 port 0
>>>>>>>>>>>>>> Mkey:............................0x0000000000000000
>>>>>>>>>>>>>> GidPrefix:.......................0xfe80000000000000
>>>>>>>>>>>>>> Lid:.............................52
>>>>>>>>>>>>>> SMLid:...........................49
>>>>>>>>>>>>>> CapMask:.........................0x42500848
>>>>>>>>>>>>>>                                 IsTrapSupported
>>>>>>>>>>>>>>                                 IsSLMappingSupported
>>>>>>>>>>>>>>                                 IsSystemImageGUIDsupported
>>>>>>>>>>>>>>                                 IsVendorClassSupported
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> IsCapabilityMaskNoticeSupported
>>>>>>>>>>>>>>                                 IsClientRegistrationSupported
>>>>>>>>>>>>>>                                 IsMulticastFDBTopSupported
>>>>>>>>>>>>>> DiagCode:........................0x0000
>>>>>>>>>>>>>> MkeyLeasePeriod:.................0
>>>>>>>>>>>>>> LocalPort:.......................1
>>>>>>>>>>>>>> LinkWidthEnabled:................1X or 4X
>>>>>>>>>>>>>> LinkWidthSupported:..............1X or 4X
>>>>>>>>>>>>>> LinkWidthActive:.................4X
>>>>>>>>>>>>>> LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>>> LinkState:.......................Active
>>>>>>>>>>>>>> PhysLinkState:...................LinkUp
>>>>>>>>>>>>>> LinkDownDefState:................Polling
>>>>>>>>>>>>>> ProtectBits:.....................0
>>>>>>>>>>>>>> LMC:.............................0
>>>>>>>>>>>>>> LinkSpeedActive:.................10.0 Gbps
>>>>>>>>>>>>>> LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0
>>>>>>>>>>>>>> Gbps
>>>>>>>>>>>>>> NeighborMTU:.....................4096
>>>>>>>>>>>>>> SMSL:............................0
>>>>>>>>>>>>>> VLCap:...........................VL0
>>>>>>>>>>>>>> InitType:........................0x00
>>>>>>>>>>>>>> VLHighLimit:.....................0
>>>>>>>>>>>>>> VLArbHighCap:....................0
>>>>>>>>>>>>>> VLArbLowCap:.....................0
>>>>>>>>>>>>>> InitReply:.......................0x00
>>>>>>>>>>>>>> MtuCap:..........................4096
>>>>>>>>>>>>>> VLStallCount:....................0
>>>>>>>>>>>>>> HoqLife:.........................0
>>>>>>>>>>>>>> OperVLs:.........................VL0
>>>>>>>>>>>>>> PartEnforceInb:..................0
>>>>>>>>>>>>>> PartEnforceOutb:.................0
>>>>>>>>>>>>>> FilterRawInb:....................0
>>>>>>>>>>>>>> FilterRawOutb:...................0
>>>>>>>>>>>>>> MkeyViolations:..................0
>>>>>>>>>>>>>> PkeyViolations:..................0
>>>>>>>>>>>>>> QkeyViolations:..................0
>>>>>>>>>>>>>> GuidCap:.........................1
>>>>>>>>>>>>>> ClientReregister:................0
>>>>>>>>>>>>>> McastPkeyTrapSuppressionEnabled:.0
>>>>>>>>>>>>>> SubnetTimeout:...................18
>>>>>>>>>>>>>> RespTimeVal:.....................20
>>>>>>>>>>>>>> LocalPhysErr:....................0
>>>>>>>>>>>>>> OverrunErr:......................0
>>>>>>>>>>>>>> MaxCreditHint:...................0
>>>>>>>>>>>>>> RoundTrip:.......................0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From what I've read in the Mellanox Release
>>>>>>>>>>>>>> Notes MultiCastFDBTop=0xBFFF is supposed to discard MC traffic. The
>>>>>>>>>>>>>> question is, how do I set this value to something else and what should it
>>>>>>>>>>>>>> be set to?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Robert LeBlanc
>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 30, 2013 at 12:28 PM, Hal Rosenstock <
>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  Determine LID of switch (in the below say switch is lid x)
>>>>>>>>>>>>>>> Then:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> smpquery si x
>>>>>>>>>>>>>>> (of interest are McastFdbCap and MulticastFDBTop)
>>>>>>>>>>>>>>>  smpquery pi x 0
>>>>>>>>>>>>>>> (of interest is CapMask)
>>>>>>>>>>>>>>> ibroute -M x
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 29, 2013 at 3:56 PM, Robert LeBlanc <
>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Both ports show up in the "saquery MCMR" results with a
>>>>>>>>>>>>>>>> JoinState of 0x1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How can I dump the parameters of a non-managed switch so
>>>>>>>>>>>>>>>> that I can confirm that multicast is not turned off on the Dell chassis IB
>>>>>>>>>>>>>>>> switches?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Robert LeBlanc
>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 5:04 PM, Coulter, Susan K <
>>>>>>>>>>>>>>>> skc at lanl.gov> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  /sys/class/net should give you the details on your
>>>>>>>>>>>>>>>>> devices, like this:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  -bash-4.1# cd /sys/class/net
>>>>>>>>>>>>>>>>> -bash-4.1# ls -l
>>>>>>>>>>>>>>>>> total 0
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 12:59 eth0 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/net/eth0
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 12:59 eth1 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.1/net/eth1
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib0 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:40/0000:40:0c.0/0000:47:00.0/net/ib0
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib1 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:40/0000:40:0c.0/0000:47:00.0/net/ib1
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib2 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:c0/0000:c0:0c.0/0000:c7:00.0/net/ib2
>>>>>>>>>>>>>>>>> lrwxrwxrwx 1 root root 0 Oct 23 15:42 ib3 ->
>>>>>>>>>>>>>>>>> ../../devices/pci0000:c0/0000:c0:0c.0/0000:c7:00.0/net/ib3
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Then use "lspci | grep Mell"  to get the pci device
>>>>>>>>>>>>>>>>> numbers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  47:00.0 Network controller: Mellanox Technologies
>>>>>>>>>>>>>>>>> MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
>>>>>>>>>>>>>>>>> c7:00.0 Network controller: Mellanox Technologies MT26428
>>>>>>>>>>>>>>>>> [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  In this example, ib0 and 1 are referencing the device at
>>>>>>>>>>>>>>>>>  47:00.0
>>>>>>>>>>>>>>>>> And ib2 and ib3 are referencing the device at c7:00.0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  That said, if you only have one card - this is probably
>>>>>>>>>>>>>>>>> not the problem.
>>>>>>>>>>>>>>>>> Additionally, since the arp requests are being seen going
>>>>>>>>>>>>>>>>> out ib0, your emulation appears to be working.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  If those arp requests are not being seen on the other
>>>>>>>>>>>>>>>>> end, it seems like a problem with the mgids.
>>>>>>>>>>>>>>>>> Like maybe the port you are trying to reach is not in the
>>>>>>>>>>>>>>>>> IPoIB multicast group?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  You can look at all the multicast member records with
>>>>>>>>>>>>>>>>> "saquery MCMR".
>>>>>>>>>>>>>>>>> Or - you can grep for mcmr_rcv_join_mgrp references in
>>>>>>>>>>>>>>>>> your SM logs …
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  HTH
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  On Oct 28, 2013, at 1:08 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I can ibping between both hosts just fine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [root at desxi003 ~]# ibping 0x37
>>>>>>>>>>>>>>>>> Pong from desxi004.(none) (Lid 55): time 0.111 ms
>>>>>>>>>>>>>>>>> Pong from desxi004.(none) (Lid 55): time 0.189 ms
>>>>>>>>>>>>>>>>> Pong from desxi004.(none) (Lid 55): time 0.189 ms
>>>>>>>>>>>>>>>>> Pong from desxi004.(none) (Lid 55): time 0.179 ms
>>>>>>>>>>>>>>>>> ^C
>>>>>>>>>>>>>>>>> --- desxi004.(none) (Lid 55) ibping statistics ---
>>>>>>>>>>>>>>>>> 4 packets transmitted, 4 received, 0% packet loss, time
>>>>>>>>>>>>>>>>> 3086 ms
>>>>>>>>>>>>>>>>> rtt min/avg/max = 0.111/0.167/0.189 ms
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [root at desxi004 ~]# ibping 0x2d
>>>>>>>>>>>>>>>>> Pong from desxi003.(none) (Lid 45): time 0.156 ms
>>>>>>>>>>>>>>>>> Pong from desxi003.(none) (Lid 45): time 0.175 ms
>>>>>>>>>>>>>>>>> Pong from desxi003.(none) (Lid 45): time 0.176 ms
>>>>>>>>>>>>>>>>> ^C
>>>>>>>>>>>>>>>>> --- desxi003.(none) (Lid 45) ibping statistics ---
>>>>>>>>>>>>>>>>> 3 packets transmitted, 3 received, 0% packet loss, time
>>>>>>>>>>>>>>>>> 2302 ms
>>>>>>>>>>>>>>>>> rtt min/avg/max = 0.156/0.169/0.176 ms
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  When I do an Ethernet ping to the IPoIB address, tcpdump
>>>>>>>>>>>>>>>>> only shows the outgoing ARP request.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [root at desxi003 ~]# tcpdump -i ib0
>>>>>>>>>>>>>>>>> tcpdump: verbose output suppressed, use -v or -vv for full
>>>>>>>>>>>>>>>>> protocol decode
>>>>>>>>>>>>>>>>> listening on ib0, link-type LINUX_SLL (Linux cooked),
>>>>>>>>>>>>>>>>> capture size 65535 bytes
>>>>>>>>>>>>>>>>> 19:00:08.950320 ARP, Request who-has 192.168.9.4 tell
>>>>>>>>>>>>>>>>> 192.168.9.3, length 56
>>>>>>>>>>>>>>>>> 19:00:09.950320 ARP, Request who-has 192.168.9.4 tell
>>>>>>>>>>>>>>>>> 192.168.9.3, length 56
>>>>>>>>>>>>>>>>> 19:00:10.950307 ARP, Request who-has 192.168.9.4 tell
>>>>>>>>>>>>>>>>> 192.168.9.3, length 56
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Running tcpdump on the rack servers I don't see the ARP
>>>>>>>>>>>>>>>>> request there which I should.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  From what I've read, ib0 should be mapped to the first
>>>>>>>>>>>>>>>>> port and ib1 should be mapped to the second port. We have one IB card with
>>>>>>>>>>>>>>>>> two ports. The modprobe is the default installed with the Mellanox drivers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [root at desxi003 etc]# cat modprobe.d/ib_ipoib.conf
>>>>>>>>>>>>>>>>> # install ib_ipoib modprobe --ignore-install ib_ipoib &&
>>>>>>>>>>>>>>>>> /sbin/ib_ipoib_sysctl load
>>>>>>>>>>>>>>>>> # remove ib_ipoib /sbin/ib_ipoib_sysctl unload ; modprobe
>>>>>>>>>>>>>>>>> -r --ignore-remove ib_ipoib
>>>>>>>>>>>>>>>>> alias ib0 ib_ipoib
>>>>>>>>>>>>>>>>> alias ib1 ib_ipoib
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Can you give me some pointers on digging into the device
>>>>>>>>>>>>>>>>> layer to make sure IPoIB is connected correctly? Would I look in /sys or
>>>>>>>>>>>>>>>>> /proc for that?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Dell has not been able to replicate the problem in their
>>>>>>>>>>>>>>>>> environment and they only support Red Hat and won't work with my CentOS
>>>>>>>>>>>>>>>>> live CD. These blades don't have internal hard drives so it makes it hard
>>>>>>>>>>>>>>>>> to install any OS. I don't know if I can engage Mellanox since they build
>>>>>>>>>>>>>>>>> the switch hardware and driver stack we are using.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I really appreciate all the help you guys have given
>>>>>>>>>>>>>>>>> thus far, I'm learning a lot as this progresses. I'm reading through
>>>>>>>>>>>>>>>>> https://tools.ietf.org/html/rfc4391 trying to understand
>>>>>>>>>>>>>>>>> IPoIB from top to bottom.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 12:53 PM, Coulter, Susan K <
>>>>>>>>>>>>>>>>> skc at lanl.gov> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  If you are not seeing any packets leave the ib0
>>>>>>>>>>>>>>>>>> interface, it sounds like the emulation layer is not connected to the right
>>>>>>>>>>>>>>>>>> device.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  If ib_ipoib kernel module is loaded, and a simple
>>>>>>>>>>>>>>>>>> native IB test works between those blades - (like ib_read_bw) you need to
>>>>>>>>>>>>>>>>>> dig into the device layer and insure ipoib is "connected" to the right
>>>>>>>>>>>>>>>>>> device.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Do you have more than 1 IB card?
>>>>>>>>>>>>>>>>>> What does your modprobe config look like for ipoib?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   On Oct 28, 2013, at 12:38 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  These ESX hosts (2 blade server and 2 rack servers) are
>>>>>>>>>>>>>>>>>> booted into a CentOS 6.2 Live CD that I built. Right now everything I'm
>>>>>>>>>>>>>>>>>> trying to get working is CentOS 6.2. All of our other hosts are running
>>>>>>>>>>>>>>>>>> ESXi and have IPoIB interfaces, but none of them are configured and I'm not
>>>>>>>>>>>>>>>>>> trying to get those working right now.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Ideally, we would like our ESX hosts to communicate
>>>>>>>>>>>>>>>>>> with each other for vMotion and protected VM traffic as well as with our
>>>>>>>>>>>>>>>>>> Commvault backup servers (Windows) over IPoIB (or Oracle's PVI which is
>>>>>>>>>>>>>>>>>> very similar).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 12:33 PM, Hal Rosenstock <
>>>>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are those ESXi IPoIB interfaces ? Do some of these work
>>>>>>>>>>>>>>>>>>> and others not ? Are there normal Linux IPoIB interfaces ? Do they work ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 2:24 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, I can not ping them over the IPoIB interface. It
>>>>>>>>>>>>>>>>>>>> is a very simple network set-up.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  desxi003
>>>>>>>>>>>>>>>>>>>>  8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520
>>>>>>>>>>>>>>>>>>>> qdisc pfifo_fast state UP qlen 256
>>>>>>>>>>>>>>>>>>>>     link/infiniband
>>>>>>>>>>>>>>>>>>>> 80:20:00:54:fe:80:00:00:00:00:00:00:f0:4d:a2:90:97:78:e7:d1 brd
>>>>>>>>>>>>>>>>>>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>>>>>>>>>>>>>>>>>>     inet 192.168.9.3/24 brd 192.168.9.255 scope global
>>>>>>>>>>>>>>>>>>>> ib0
>>>>>>>>>>>>>>>>>>>>     inet6 fe80::f24d:a290:9778:e7d1/64 scope link
>>>>>>>>>>>>>>>>>>>>        valid_lft forever preferred_lft forever
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  desxi004
>>>>>>>>>>>>>>>>>>>>  8: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520
>>>>>>>>>>>>>>>>>>>> qdisc pfifo_fast state UP qlen 256
>>>>>>>>>>>>>>>>>>>>     link/infiniband
>>>>>>>>>>>>>>>>>>>> 80:20:00:54:fe:80:00:00:00:00:00:00:f0:4d:a2:90:97:78:e7:15 brd
>>>>>>>>>>>>>>>>>>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>>>>>>>>>>>>>>>>>>     inet 192.168.9.4/24 brd 192.168.9.255 scope global
>>>>>>>>>>>>>>>>>>>> ib0
>>>>>>>>>>>>>>>>>>>>     inet6 fe80::f24d:a290:9778:e715/64 scope link
>>>>>>>>>>>>>>>>>>>>        valid_lft forever preferred_lft forever
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 12:22 PM, Hal Rosenstock <
>>>>>>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So these 2 hosts have trouble talking IPoIB to each
>>>>>>>>>>>>>>>>>>>>> other ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 2:16 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I was just wondering about that. It seems reasonable
>>>>>>>>>>>>>>>>>>>>>> that the broadcast traffic would go over multicast, but effectively
>>>>>>>>>>>>>>>>>>>>>> channels would be created for node to node communication, otherwise the
>>>>>>>>>>>>>>>>>>>>>> entire multicast group would be limited to 10 Gbps (in this instance) for
>>>>>>>>>>>>>>>>>>>>>> the whole group. That doesn't scale very well.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  The things I've read about IPoIB performance tuning
>>>>>>>>>>>>>>>>>>>>>> seems pretty vague, and the changes most people recommend seem to be
>>>>>>>>>>>>>>>>>>>>>> already in place on the systems I'm using. Some people said, try using a
>>>>>>>>>>>>>>>>>>>>>> newer version of Ubuntu, but ultimately, I have very little control over
>>>>>>>>>>>>>>>>>>>>>> VMware. Once I can get the Linux machines to communicate IPoIB between the
>>>>>>>>>>>>>>>>>>>>>> racks and blades, then I'm going to turn my attention over to performance
>>>>>>>>>>>>>>>>>>>>>> optimization. It doesn't seem to make much sense to spend time there when
>>>>>>>>>>>>>>>>>>>>>> it is not working at all for most machines.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  I've done ibtracert between the two nodes, is that
>>>>>>>>>>>>>>>>>>>>>> what you mean by walking the route?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  [root at desxi003 ~]# ibtracert -m 0xc000 0x2d 0x37
>>>>>>>>>>>>>>>>>>>>>> From ca 0xf04da2909778e7d0 port 1 lid 45-45
>>>>>>>>>>>>>>>>>>>>>> "localhost HCA-1"
>>>>>>>>>>>>>>>>>>>>>> [1] -> switch 0x2c90200448ec8[17] lid 51
>>>>>>>>>>>>>>>>>>>>>> "Infiniscale-IV Mellanox Technologies"
>>>>>>>>>>>>>>>>>>>>>> [18] -> ca 0xf04da2909778e714[1] lid 55 "localhost
>>>>>>>>>>>>>>>>>>>>>> HCA-1"
>>>>>>>>>>>>>>>>>>>>>> To ca 0xf04da2909778e714 port 1 lid 55-55 "localhost
>>>>>>>>>>>>>>>>>>>>>> HCA-1"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  [root at desxi004 ~]# ibtracert -m 0xc000 0x37 0x2d
>>>>>>>>>>>>>>>>>>>>>> From ca 0xf04da2909778e714 port 1 lid 55-55
>>>>>>>>>>>>>>>>>>>>>> "localhost HCA-1"
>>>>>>>>>>>>>>>>>>>>>> [1] -> switch 0x2c90200448ec8[18] lid 51
>>>>>>>>>>>>>>>>>>>>>> "Infiniscale-IV Mellanox Technologies"
>>>>>>>>>>>>>>>>>>>>>> [17] -> ca 0xf04da2909778e7d0[1] lid 45 "localhost
>>>>>>>>>>>>>>>>>>>>>> HCA-1"
>>>>>>>>>>>>>>>>>>>>>> To ca 0xf04da2909778e7d0 port 1 lid 45-45 "localhost
>>>>>>>>>>>>>>>>>>>>>> HCA-1"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  As you can see, the route is on the same switch,
>>>>>>>>>>>>>>>>>>>>>> the blades are right next to each other.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 12:05 PM, Hal Rosenstock <
>>>>>>>>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>  Which mystery is explained ? The 10 Gbps is a
>>>>>>>>>>>>>>>>>>>>>>> multicast only limit and does not apply to unicast. The BW limitation
>>>>>>>>>>>>>>>>>>>>>>> you're seeing is due to other factors. There's been much written about
>>>>>>>>>>>>>>>>>>>>>>> IPoIB performance.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If all the MC members are joined and routed, then
>>>>>>>>>>>>>>>>>>>>>>> the IPoIB connectivity issue is some other issue. Are you sure this is the
>>>>>>>>>>>>>>>>>>>>>>> case ? Did you walk the route between 2 nodes where you have a connectivity
>>>>>>>>>>>>>>>>>>>>>>> issue ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 1:58 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Well, that explains one mystery, now I need to
>>>>>>>>>>>>>>>>>>>>>>>> figure out why it seems the Dell blades are not passing the traffic.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 11:51 AM, Hal Rosenstock <
>>>>>>>>>>>>>>>>>>>>>>>> hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  Yes, that's the IPoIB IPv4 broadcast group for
>>>>>>>>>>>>>>>>>>>>>>>>> the default (0xffff) partition. 0x80 part of mtu and rate just means "is
>>>>>>>>>>>>>>>>>>>>>>>>> equal to". mtu 0x04 is 2K (2048) and rate 0x3 is 10 Gb/sec. These are
>>>>>>>>>>>>>>>>>>>>>>>>> indeed the defaults.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 1:45 PM, Robert LeBlanc <
>>>>>>>>>>>>>>>>>>>>>>>>> robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The info for that MGID is:
>>>>>>>>>>>>>>>>>>>>>>>>>> MCMemberRecord group dump:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> MGID....................ff12:401b:ffff::ffff:ffff
>>>>>>>>>>>>>>>>>>>>>>>>>>                 Mlid....................0xC000
>>>>>>>>>>>>>>>>>>>>>>>>>>                 Mtu.....................0x84
>>>>>>>>>>>>>>>>>>>>>>>>>>                 pkey....................0xFFFF
>>>>>>>>>>>>>>>>>>>>>>>>>>                 Rate....................0x83
>>>>>>>>>>>>>>>>>>>>>>>>>>                 SL......................0x0
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>  I don't understand the MTU and Rate (130 and
>>>>>>>>>>>>>>>>>>>>>>>>>> 131 dec). When I run iperf between the two hosts over IPoIB in connected
>>>>>>>>>>>>>>>>>>>>>>>>>> mode and MTU 65520. I've tried multiple threads, but the sum is still 10
>>>>>>>>>>>>>>>>>>>>>>>>>> Gbps.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 11:40 AM, Hal
>>>>>>>>>>>>>>>>>>>>>>>>>> Rosenstock <hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  saquery -g should show what MGID is mapped to
>>>>>>>>>>>>>>>>>>>>>>>>>>> MLID 0xc000 and the group parameters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  When you say 10 Gbps max, is that multicast or
>>>>>>>>>>>>>>>>>>>>>>>>>>> unicast ? That limit is only on the multicast.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 1:28 PM, Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>>>>> <robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Well, that can explain why I'm only able to get
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10 Gbps max from the two hosts that are working.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  I have tried updn and dnup and they didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>> help either. I think the only thing that will help is Automatic Path
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Migration is it tries very hard to route the alternative LIDs through
>>>>>>>>>>>>>>>>>>>>>>>>>>>> different systemguids. I suspect it would require re-LIDing everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which would mean an outage. I'm still trying to get answers from Oracle if
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that is even a possibility. I've tried seeding some of the algorithms with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> information like root nodes, etc, but none of them worked better.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  The MLID 0xc000 exists and I can see all the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> nodes joined to the group using saquery. I've checked the route using
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ibtracert specifying the MLID. The only thing I'm not sure how to check is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the group parameters. What tool would I use for that?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 11:16 AM, Hal
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Rosenstock <hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Xsigo's SM is not "straight" OpenSM. They
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have some proprietary enhancements and it may be based on old vintage of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OpenSM. You will likely need to work with them/Oracle now on issues.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lack of a partitions file does mean default
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partition and default rate (10 Gbps) so from what I saw all ports had
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sufficient rate to join MC group.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are certain topology requirements for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running various routing algorithms. Did you try updn or dnup ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The key is determining whether the IPoIB
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> broadcast group is setup correctly. What MLID is the group built on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (usually 0xc000) ? What are the group parameters (rate, MTU) ? Are all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> members that are running IPoIB joined ? Is the group routed to all such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> members ? There are infiniband-diags for all of this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 28, 2013 at 12:19 PM, Robert
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LeBlanc <robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OpenSM (the SM runs on Xsigo so they manage
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it) is using minhop. I've loaded the ibnetdiscover output into ibsim and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run all the different routing algorithms against it with and without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scatter ports. Minhop had 50% of our hosts running all paths through a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> single IS5030 switch (at least the LIDs we need which represent Ethernet
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and Fibre Channel cards the hosts should communicate with). Ftree, dor, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dfsssp failed back to minhop, the others routed more paths through the same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IS5030 in some cases increasing our host count with single point of failure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to 75%.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  As far as I can tell there is no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> partitions.conf file so I assume we are using the default partition. There
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is an opensm.opts file, but it only specifies logging information.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  # SA database file name
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sa_db_file /var/log/opensm-sa.dump
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  # If TRUE causes OpenSM to dump SA database
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the end of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # every light sweep, regardless of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> verbosity level
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sa_db_dump TRUE
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  # The directory to hold the file OpenSM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dumps
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dump_files_dir /var/log/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  The SM node is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  xsigoa:/opt/xsigo/xsigos/current/ofed/etc#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ibaddr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GID fe80::13:9702:100:979 LID start 0x1 end
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0x1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  We do have Switch-X in two of the Dell
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> m1000e chassis but the cards, ports 17-32, are FDR10 (the switch may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straight FDR, but I'm not 100% sure). The IS5030 are QDR which the Switch-X
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are connected to, the switches in the Xsigo directors are QDR, but the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ethernet and Fibre Channel cards are DDR. The DDR cards will not be running
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IPoIB (at least to my knowledge they don't have the ability), only the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hosts should be leveraging IPoIB. I hope that clears up some of your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions. If you have more, I will try to answer them.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  On Mon, Oct 28, 2013 at 9:57 AM, Hal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Rosenstock <hal.rosenstock at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  What routing algorithm is configured in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OpenSM ? What does your partitions.conf file look like ? Which node is your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OpenSM ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I only see QDR and DDR links although
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you have Switch-X so I assume all FDR ports are connected to slower (QDR)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> devices. I don't see any FDR-10 ports but maybe they're also connected to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> QDR ports so show up as QDR in the topology.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are DDR CAs in Xsigo box but not sure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether or not they run IPoIB.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- Hal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  On Sun, Oct 27, 2013 at 9:46 PM, Robert
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LeBlanc <robert_leblanc at byu.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Since you guys are amazingly helpful, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thought I would pick your brains in a new problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  We have two Xsigo directors cross
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected to four Mellanox IS5030 switches. Connected to those we have four
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dell m1000e chassis each with two IB switches (two chassis have QDR and two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have FDR10). We have 9 dual-port rack servers connected to the IS5030
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> switches. For testing purposes we have an additional Dell m1000e QDR
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> chassis connected to one Xsigo director and two dual-port FDR10 rack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> servers connected to the other Xsigo director.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  I can get IPoIB to work between the two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test rack servers connected to the one Xsigo director. But I can not get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IPoIB to work between any blades either right next to each other or to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working rack servers. I'm using the same exact live CentOS ISO on all four
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> servers. I've checked opensm and the blades have joined the multicast group
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xc000 properly. tcpdump basically says that traffic is not leaving the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> blades. tcpdump also shows no traffic entering the blades from the rack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> servers. An ibtracert using 0xc000 mlid shows that routing exists between
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hosts.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  I've read about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MulticastFDBTop=0xBFFF but I don't know how to set it and I doubt it would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have been set by default.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Anyone have some ideas on troubleshooting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> steps to try? I think Google is tired of me asking questions about it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Robert LeBlanc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OIT Infrastructure & Virtualization Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brigham Young University
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>> Users at lists.openfabrics.org
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ====================================
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Susan Coulter
>>>>>>>>>>>>>>>>>> HPC-3 Network/Infrastructure
>>>>>>>>>>>>>>>>>> 505-667-8425
>>>>>>>>>>>>>>>>>> Increase the Peace...
>>>>>>>>>>>>>>>>>> An eye for an eye leaves the whole world blind
>>>>>>>>>>>>>>>>>> ====================================
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  ====================================
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Susan Coulter
>>>>>>>>>>>>>>>>> HPC-3 Network/Infrastructure
>>>>>>>>>>>>>>>>> 505-667-8425
>>>>>>>>>>>>>>>>> Increase the Peace...
>>>>>>>>>>>>>>>>> An eye for an eye leaves the whole world blind
>>>>>>>>>>>>>>>>> ====================================
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131115/13b6e0f1/attachment.html>


More information about the Users mailing list