[ewg] Infiniband Interoperability

Hal Rosenstock hal.rosenstock at gmail.com
Thu Jul 8 03:44:52 PDT 2010


On Wed, Jul 7, 2010 at 8:04 PM, David Brean <david.brean at oracle.com> wrote:
> Correct, a SM hasn't been released for OpenSolaris, yet.
>
> Looks like a very unusual multicast address because it doesn't have the IPoIB or Subnet Administrator signature.

Yes, it's something in Windows that does that. Not sure what it's used
for. Sean asked about it last week but there has been no response as
yet.

-- Hal

>
> -David
>
> On 7/7/10 6:37 PM, Matt Breitbach wrote:
>> We disconnected one port on the IB card that was a dual port card.  The
>> second port was not configured, so I can't imagine it caused problems,
>> but it is completely disconnected now.
>>
>> As for a Subnet Manager on OpenSolaris - there isn't one. I believe they
>> do have one for Solaris, but I do not believe that it's been released to
>> OpenSolaris, and I can't find it anywhere on our system.
>>
>> ------------------------------------------------------------------------
>>
>> *From:* richard at informatix-sol.com [mailto:richard at informatix-sol.com]
>> *Sent:* Thursday, July 01, 2010 12:54 AM
>> *To:* Matt Breitbach; ewg at lists.openfabrics.org
>> *Subject:* Re: [ewg] Infiniband Interoperability
>>
>> When I had multiple SM's running none reported it as a problem.
>> Sun developed their own for Solaris. I can't recall now what they called it.
>>
>> The other possibility i've seen cause problems with ipoib is having 2
>> ports on the same IP subnet. Either bond them or disable ARP responses
>> on one port. This is due to the broadcast simulation across multicast.
>>
>>
>> Richard
>>
>> ----- Reply message -----
>> From: "Matt Breitbach" <matthewb at flash.shanje.com>
>> Date: Wed, Jun 30, 2010 19:32
>> Subject: [ewg] Infiniband Interoperability
>> To: <richard at informatix-sol.com>, <ewg at lists.openfabrics.org>
>>
>> The Mellanox switch as far as I can tell does not have any SM running. It
>> is a pretty dumb switch and there really isn't much to configure on it.
>>
>>
>>
>> LID 6 is the LID that OpenSM is running on - which is our CentOS 5.5 blade.
>> I believe that it's reporting the issue since it's the Subnet Manager.
>>
>>
>>
>> The only other possibility is that there is a subnet manager running on the
>> OpenSolaris box, but I have not been able to find one to blame this on. I
>> would also think that in the OpenSM.log file I would find some reports of an
>> election of some sort if there were multiple SM's running.
>>
>>
>>
>> LID listing :
>>
>>
>>
>> LID 1 - SuperMicro 4U running OpenSolaris (InfiniHost EX III PCI-E card w/
>> 128MB RAM)
>>
>> LID 2 - Blade Server currently running CentOS 5.5 and Xen (ConnectX
>> Mezzanine card)
>>
>> LID 3 - InfiniScale III Switch
>>
>> LID 4 - SuperMicro 4U running OpenSolaris (InfiniHost EX III PCI-E card w/
>> 128MB RAM - 2nd port)
>>
>> LID 5 - Blade Server running Windows 2008R2 (ConnectX Mezzanine card)
>>
>> LID 6 - Blade Server running CentOS 5.5 and OpenSM (ConnectX Mezzanine card)
>>
>> LID 7 - Blade Server running Windows 2008 (InfiniHost EX III Mezzanine card)
>>
>>
>>
>> As for toggling the enable state - according to ibdiagnet the lowest
>> connected rate member is at 20Gbps, but the network is only operating at
>> 10Gbps. I'm not sure which system I would toggle the enable state for.
>>
>>
>>
>> -Matt Breitbach
>>
>> _____
>>
>> From: richard at informatix-sol.com [mailto:richard at informatix-sol.com]
>> Sent: Wednesday, June 30, 2010 1:14 PM
>> To: Matt Breitbach; ewg at lists.openfabrics.org
>> Subject: Re: [ewg] Infiniband Interoperability
>>
>>
>>
>> I'm still suspicious that you have more than one SM running. Mellonex
>> switches have it enabled by default.
>> It's common that ARP requests, as caused by ping, will result in multicast
>> group activity.
>> Infiniband creates these on demand and tears them down if there are no
>> current members. There is no broadcast address. It uses a dedicated MC
>> group.
>> They all seem to originate to LID 6 so you can trace the source.
>>
>> If you have ports at non optimal speeds, try toggling their enable state.
>> This often fixes it.
>>
>> Richard
>>
>> ----- Reply message -----
>> From: "Matt Breitbach" <matthewb at flash.shanje.com>
>> Date: Wed, Jun 30, 2010 15:33
>> Subject: [ewg] Infiniband Interoperability
>> To: <ewg at lists.openfabrics.org>
>>
>> Well, let me throw out a little about the environment :
>>
>>
>>
>> We are running one SuperMicro 4U system with a Mellanox InfiniHost III EX
>> card w/ 128MB RAM. This box is the OpenSolaris box. It's running the
>> OpenSolaris Infiniband stack, but no SM. Both ports are cabled to the IB
>> Switch to ports 1 and 2.
>>
>>
>>
>> The other systems are in a SuperMicro Bladecenter. The switch in the
>> BladeCenter is an InfiniScale III switch with 10 internal ports and 10
>> external ports.
>>
>>
>>
>> 3 blades are connected with Mellanox ConnectX Mezzanine cards. 1 blade is
>> connected with an InfiniHost III EX Mezzanine card.
>>
>>
>>
>> One of the blades is running CentOS and the 1.5.1 OFED release. OpenSM is
>> running on that system, and is the only SM running on the network. This
>> blade is using a ConnectX Mezzanine card.
>>
>>
>>
>> One blade is running Windows 2008 with the latest OFED drivers installed.
>> It is using an InfiniHost III EX Mezzanine card.
>>
>>
>>
>> One blade is running Windows 2008 R2 with the latest OFED drivers installed.
>> It is using an ConnectX Mezzanine card.
>>
>>
>>
>> One blade has been switching between Windows 2008 R2 and CentOS with Xen.
>> Windows 2008 is running the latest OFED drivers, CentOS is running the 1.5.2
>> RC2. That blade is using a ConnectX Mezzanine card.
>>
>>
>>
>> All of the firmware has been updated on the Mezzanine cards, the PCI-E
>> InfiniHost III EX card, and the switch. All of the Windows boxes are
>> configured to use Connected mode. I have not changed any other settings on
>> the Linux boxes.
>>
>>
>>
>> As of right now, the network seems stable. I've been running pings for the
>> last 12 hours, and nothing has dropped.
>>
>>
>>
>> I did notice in the OpenSM log though some odd entries that I do not believe
>> belong there.
>>
>>
>>
>> Jun 30 06:56:26 832438 [B5723B90] 0x02 -> log_notice: Reporting Generic
>> Notice type:3 num:67 (Mcast group deleted) from LID:6
>> GID:ff12:1405:ffff::3333:1:2
>>
>> Jun 30 06:57:53 895990 [B5723B90] 0x02 -> log_notice: Reporting Generic
>> Notice type:3 num:66 (New mcast group created) from LID:6
>> GID:ff12:1405:ffff::3333:1:2
>>
>> Jun 30 07:18:06 770861 [B6124B90] 0x02 -> log_notice: Reporting Generic
>> Notice type:3 num:67 (Mcast group deleted) from LID:6
>> GID:ff12:1405:ffff::3333:1:2
>>
>> Jun 30 07:19:14 835273 [B5723B90] 0x02 -> log_notice: Reporting Generic
>> Notice type:3 num:66 (New mcast group created) from LID:6
>> GID:ff12:1405:ffff::3333:1:2
>>
>>
>>
>>
>>
>> I would not think that new mcast groups should be created or deleted when
>> there are no new adapters being added to the network, especially in this
>> small of a network. Is it odd to see those messages?
>>
>>
>>
>> Also, I have a warning when I run ibdiagnet - "Suboptimal rate for group.
>> Lowest member rate: 20Gbps > group-rat
>>
>>
>>
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>



More information about the ewg mailing list