[Users] Subnet question

Hal Rosenstock hal.rosenstock at gmail.com
Tue Oct 8 12:00:11 PDT 2013


Are you running Xsigo hardware ?


On Tue, Oct 8, 2013 at 1:38 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:

> Kevin,
>
> Thanks for the input, I'll look into it. Does scatter ports work with
> MinHop, of do I need to use UpDown (the only time I've seen Scatter Ports
> mentioned
> http://www2.cisl.ucar.edu/sites/default/files/Mizero,%20F_SIParCS2013.pdf
> ).
>
> Thanks,
>
>
> Robert LeBlanc
> OIT Infrastructure & Virtualization Engineer
> Brigham Young University
>
>
> On Tue, Oct 8, 2013 at 11:35 AM, Kevin Harms <harms at alcf.anl.gov> wrote:
>
>>
>>   This is not a full solution to your problem, but you can add/set the
>> value
>>   scatter_ports 37
>>   in your opensm.conf. This generally improves the distribution of the
>> paths over available ports. It will not guarantee that all routes to N
>> don't flow through a given switch but it is much more likely not to occur.
>> The value of 37 is the initial seed for the algorithm. In our case, we did
>> this to improve performance.
>>
>> kevin
>>
>> On Oct 8, 2013, at 12:04 PM, Robert LeBlanc <robert_leblanc at byu.edu>
>> wrote:
>>
>> > We have been running Oracle OVN (previously Xsigo) in our data center
>> > environment for two years now. This last week we upgraded the firmware
>> on
>> > our Mellanox IS5030 switches to see if that would help resolve some
>> > communication issues for Oracle PVI and IPoIB. The Oracle OVN creates
>> > virtual NICs and virtual HBAs and encapsulates the traffic, sends it
>> over
>> > the Infiniband fabric and to the directors where the data is
>> unencapsulated
>> > and sent out on the traditional Ethernet and Fibre Channel networks.
>> >
>> > During our upgrade, it seems that it so happened that all four of our
>> vHBAs
>> > were routed through the same IS5030 switch causing all of the storage
>> for
>> > some of our ESX hosts to disappear when the switch was rebooted. This
>> > caused an APD state and some of the VMs suffered corruption. We are now
>> > looking for ways that we can make sure the routing tries to evenly
>> spread
>> > out all of the routes between available paths to help reduce/prevent
>> this
>> > in the future. We would want an algorithm that focuses on availability
>> and
>> > is part of the standard OFED openSM. We are looking for stability over
>> > cutting edge. We are currently using MinHop for the routing algorithm.
>> I'm
>> > attaching a diagram (
>> >
>> https://docs.google.com/drawings/d/18pMOpiM7Bz2kaiyI0NOzB1q5o-0Zcy-9E1hJYNm6uNg/edit?usp=sharing
>> )
>> > of our environment as I know different algorithms are tailored for
>> > different environments.
>> >
>> > We are also looking to try to extract our topology, load it into
>> IBMgtSim
>> > and run simulations on MinHop and other algorithms to see what the
>> > probability of having all paths run through one switch are. If you have
>> any
>> > pointer, we would be glad to accept them. One difficulty is that when I
>> do
>> > ibnetdiscover, it is showing me the ports of the HCAs, but not the node
>> > GUID of the card. I suppose that if I see CA, I can subtract the port
>> > number from the port GUID to get the host GUID, would that be a safe
>> > assumption.
>> >
>> >> ibnetdiscover
>> > CA    56  2 0xf04da2909778e716 4x QDR - SW    52 18 0x0002c90200448e28 (
>> > 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
>> > Technologies' )
>> > CA    55  1 0xf04da2909778e715 4x QDR - SW    51 18 0x0002c90200448ec8 (
>> > 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
>> > Technologies' )
>> > ...snip...
>> >
>> >> ibhosts
>> > Ca      : 0xf04da2909778e714 ports 2 "MT25408 ConnectX Mellanox
>> > Technologies"
>> > ...snip...
>> >
>> >
>> > Thank you in advance for reading and helping us.
>> >
>> > Robert LeBlanc
>> > OIT Infrastructure & Virtualization Engineer
>> > Brigham Young University
>> > <Fabric Design
>> Public.pdf>_______________________________________________
>> > Users mailing list
>> > Users at lists.openfabrics.org
>> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131008/df10bf6c/attachment.html>


More information about the Users mailing list