[Users] Subnet question

Robert LeBlanc robert_leblanc at byu.edu
Tue Oct 8 12:16:04 PDT 2013


Good to know, this isn't being done without Oracle knowing what we are
doing. We have an SR open with them, I'm just exploring options and feeding
back my findings to them. They have mentioned in the past that we could use
our own SM if we wanted to.


Robert LeBlanc
OIT Infrastructure & Virtualization Engineer
Brigham Young University


On Tue, Oct 8, 2013 at 1:12 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:

> AFAIK Xsigo has their own version of OpenSM with some proprietary changes
> that were not given back so I'm not sure all the right things will happen
> if you use a different version of OpenSM.
>
> -- Hal
>
>
> On Tue, Oct 8, 2013 at 3:03 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:
>
>> Yes we are running two Xsigo VP780 (now called Oracle Fabric Interconnect
>> F-15).
>>
>>
>> Robert LeBlanc
>> OIT Infrastructure & Virtualization Engineer
>> Brigham Young University
>>
>>
>> On Tue, Oct 8, 2013 at 1:00 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:
>>
>>> Are you running Xsigo hardware ?
>>>
>>>
>>> On Tue, Oct 8, 2013 at 1:38 PM, Robert LeBlanc <robert_leblanc at byu.edu>wrote:
>>>
>>>> Kevin,
>>>>
>>>> Thanks for the input, I'll look into it. Does scatter ports work with
>>>> MinHop, of do I need to use UpDown (the only time I've seen Scatter Ports
>>>> mentioned
>>>> http://www2.cisl.ucar.edu/sites/default/files/Mizero,%20F_SIParCS2013.pdf
>>>> ).
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Robert LeBlanc
>>>> OIT Infrastructure & Virtualization Engineer
>>>> Brigham Young University
>>>>
>>>>
>>>> On Tue, Oct 8, 2013 at 11:35 AM, Kevin Harms <harms at alcf.anl.gov>wrote:
>>>>
>>>>>
>>>>>   This is not a full solution to your problem, but you can add/set the
>>>>> value
>>>>>   scatter_ports 37
>>>>>   in your opensm.conf. This generally improves the distribution of the
>>>>> paths over available ports. It will not guarantee that all routes to N
>>>>> don't flow through a given switch but it is much more likely not to occur.
>>>>> The value of 37 is the initial seed for the algorithm. In our case, we did
>>>>> this to improve performance.
>>>>>
>>>>> kevin
>>>>>
>>>>> On Oct 8, 2013, at 12:04 PM, Robert LeBlanc <robert_leblanc at byu.edu>
>>>>> wrote:
>>>>>
>>>>> > We have been running Oracle OVN (previously Xsigo) in our data center
>>>>> > environment for two years now. This last week we upgraded the
>>>>> firmware on
>>>>> > our Mellanox IS5030 switches to see if that would help resolve some
>>>>> > communication issues for Oracle PVI and IPoIB. The Oracle OVN creates
>>>>> > virtual NICs and virtual HBAs and encapsulates the traffic, sends it
>>>>> over
>>>>> > the Infiniband fabric and to the directors where the data is
>>>>> unencapsulated
>>>>> > and sent out on the traditional Ethernet and Fibre Channel networks.
>>>>> >
>>>>> > During our upgrade, it seems that it so happened that all four of
>>>>> our vHBAs
>>>>> > were routed through the same IS5030 switch causing all of the
>>>>> storage for
>>>>> > some of our ESX hosts to disappear when the switch was rebooted. This
>>>>> > caused an APD state and some of the VMs suffered corruption. We are
>>>>> now
>>>>> > looking for ways that we can make sure the routing tries to evenly
>>>>> spread
>>>>> > out all of the routes between available paths to help reduce/prevent
>>>>> this
>>>>> > in the future. We would want an algorithm that focuses on
>>>>> availability and
>>>>> > is part of the standard OFED openSM. We are looking for stability
>>>>> over
>>>>> > cutting edge. We are currently using MinHop for the routing
>>>>> algorithm. I'm
>>>>> > attaching a diagram (
>>>>> >
>>>>> https://docs.google.com/drawings/d/18pMOpiM7Bz2kaiyI0NOzB1q5o-0Zcy-9E1hJYNm6uNg/edit?usp=sharing
>>>>> )
>>>>> > of our environment as I know different algorithms are tailored for
>>>>> > different environments.
>>>>> >
>>>>> > We are also looking to try to extract our topology, load it into
>>>>> IBMgtSim
>>>>> > and run simulations on MinHop and other algorithms to see what the
>>>>> > probability of having all paths run through one switch are. If you
>>>>> have any
>>>>> > pointer, we would be glad to accept them. One difficulty is that
>>>>> when I do
>>>>> > ibnetdiscover, it is showing me the ports of the HCAs, but not the
>>>>> node
>>>>> > GUID of the card. I suppose that if I see CA, I can subtract the port
>>>>> > number from the port GUID to get the host GUID, would that be a safe
>>>>> > assumption.
>>>>> >
>>>>> >> ibnetdiscover
>>>>> > CA    56  2 0xf04da2909778e716 4x QDR - SW    52 18
>>>>> 0x0002c90200448e28 (
>>>>> > 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
>>>>> > Technologies' )
>>>>> > CA    55  1 0xf04da2909778e715 4x QDR - SW    51 18
>>>>> 0x0002c90200448ec8 (
>>>>> > 'MT25408 ConnectX Mellanox Technologies' - 'Infiniscale-IV Mellanox
>>>>> > Technologies' )
>>>>> > ...snip...
>>>>> >
>>>>> >> ibhosts
>>>>> > Ca      : 0xf04da2909778e714 ports 2 "MT25408 ConnectX Mellanox
>>>>> > Technologies"
>>>>> > ...snip...
>>>>> >
>>>>> >
>>>>> > Thank you in advance for reading and helping us.
>>>>> >
>>>>> > Robert LeBlanc
>>>>> > OIT Infrastructure & Virtualization Engineer
>>>>> > Brigham Young University
>>>>> > <Fabric Design
>>>>> Public.pdf>_______________________________________________
>>>>> > Users mailing list
>>>>> > Users at lists.openfabrics.org
>>>>> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20131008/18233283/attachment.html>


More information about the Users mailing list