[Users] Subnet question

Kevin Harms harms at alcf.anl.gov
Tue Oct 8 10:35:58 PDT 2013


  This is not a full solution to your problem, but you can add/set the value
  scatter_ports 37
  in your opensm.conf. This generally improves the distribution of the paths over available ports. It will not guarantee that all routes to N don't flow through a given switch but it is much more likely not to occur. The value of 37 is the initial seed for the algorithm. In our case, we did this to improve performance.

kevin

On Oct 8, 2013, at 12:04 PM, Robert LeBlanc <robert_leblanc at byu.edu> wrote:

> We have been running Oracle OVN (previously Xsigo) in our data center
> environment for two years now. This last week we upgraded the firmware on
> our Mellanox IS5030 switches to see if that would help resolve some
> communication issues for Oracle PVI and IPoIB. The Oracle OVN creates
> virtual NICs and virtual HBAs and encapsulates the traffic, sends it over
> the Infiniband fabric and to the directors where the data is unencapsulated
> and sent out on the traditional Ethernet and Fibre Channel networks.
> 
> During our upgrade, it seems that it so happened that all four of our vHBAs
> were routed through the same IS5030 switch causing all of the storage for
> some of our ESX hosts to disappear when the switch was rebooted. This
> caused an APD state and some of the VMs suffered corruption. We are now
> looking for ways that we can make sure the routing tries to evenly spread
> out all of the routes between available paths to help reduce/prevent this
> in the future. We would want an algorithm that focuses on availability and
> is part of the standard OFED openSM. We are looking for stability over
> cutting edge. We are currently using MinHop for the routing algorithm. I'm
> attaching a diagram (
> https://docs.google.com/drawings/d/18pMOpiM7Bz2kaiyI0NOzB1q5o-0Zcy-9E1hJYNm6uNg/edit?usp=sharing)
> of our environment as I know different algorithms are tailored for
> different environments.
> 
> We are also looking to try to extract our topology, load it into IBMgtSim
> and run simulations on MinHop and other algorithms to see what the
> probability of having all paths run through one switch are. If you have any
> pointer, we would be glad to accept them. One difficulty is that when I do
> ibnetdiscover, it is showing me the ports of the HCAs, but not the node
> GUID of the card. I suppose that if I see CA, I can subtract the port
> number from the port GUID to get the host GUID, would that be a safe
> assumption.
> 
>> ibnetdiscover
> CA    56  2 0xf04da2909778e716 4x QDR - SW    52 18 0x0002c90200448e28 (
> 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
> Technologies' )
> CA    55  1 0xf04da2909778e715 4x QDR - SW    51 18 0x0002c90200448ec8 (
> 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
> Technologies' )
> ...snip...
> 
>> ibhosts
> Ca      : 0xf04da2909778e714 ports 2 "MT25408 ConnectX Mellanox
> Technologies"
> ...snip...
> 
> 
> Thank you in advance for reading and helping us.
> 
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
> <Fabric Design Public.pdf>_______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users




More information about the Users mailing list