[Users] Subnet question

Robert LeBlanc robert_leblanc at byu.edu
Tue Oct 8 10:04:57 PDT 2013


We have been running Oracle OVN (previously Xsigo) in our data center
environment for two years now. This last week we upgraded the firmware on
our Mellanox IS5030 switches to see if that would help resolve some
communication issues for Oracle PVI and IPoIB. The Oracle OVN creates
virtual NICs and virtual HBAs and encapsulates the traffic, sends it over
the Infiniband fabric and to the directors where the data is unencapsulated
and sent out on the traditional Ethernet and Fibre Channel networks.

During our upgrade, it seems that it so happened that all four of our vHBAs
were routed through the same IS5030 switch causing all of the storage for
some of our ESX hosts to disappear when the switch was rebooted. This
caused an APD state and some of the VMs suffered corruption. We are now
looking for ways that we can make sure the routing tries to evenly spread
out all of the routes between available paths to help reduce/prevent this
in the future. We would want an algorithm that focuses on availability and
is part of the standard OFED openSM. We are looking for stability over
cutting edge. We are currently using MinHop for the routing algorithm. I'm
attaching a diagram (
https://docs.google.com/drawings/d/18pMOpiM7Bz2kaiyI0NOzB1q5o-0Zcy-9E1hJYNm6uNg/edit?usp=sharing)
of our environment as I know different algorithms are tailored for
different environments.

We are also looking to try to extract our topology, load it into IBMgtSim
and run simulations on MinHop and other algorithms to see what the
probability of having all paths run through one switch are. If you have any
pointer, we would be glad to accept them. One difficulty is that when I do
ibnetdiscover, it is showing me the ports of the HCAs, but not the node
GUID of the card. I suppose that if I see CA, I can subtract the port
number from the port GUID to get the host GUID, would that be a safe
assumption.

>ibnetdiscover
CA    56  2 0xf04da2909778e716 4x QDR - SW    52 18 0x0002c90200448e28 (
'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
Technologies' )
CA    55  1 0xf04da2909778e715 4x QDR - SW    51 18 0x0002c90200448ec8 (
'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
Technologies' )
...snip...

>ibhosts
Ca      : 0xf04da2909778e714 ports 2 "MT25408 ConnectX Mellanox
Technologies"
...snip...


Thank you in advance for reading and helping us.

Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131008/62135d48/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fabric Design Public.pdf
Type: application/pdf
Size: 28865 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131008/62135d48/attachment.pdf>


More information about the Users mailing list