<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:st1="urn:schemas-microsoft-com:office:smarttags" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><o:SmartTagType
namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="PersonName"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:Arial;
color:navy;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'>We disconnected one port on the IB card
that was a dual port card. The second port was not configured, so I can't
imagine it caused problems, but it is completely disconnected now.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'>As for a Subnet Manager on OpenSolaris -
there isn't one. I believe they do have one for Solaris, but I do not believe
that it's been released to OpenSolaris, and I can't find it anywhere on our
system.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face="Times New Roman"><span style='font-size:12.0pt'>
<hr size=2 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2
face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'>
richard@informatix-sol.com [mailto:richard@informatix-sol.com] <br>
<b><span style='font-weight:bold'>Sent:</span></b> Thursday, July 01, 2010
12:54 AM<br>
<b><span style='font-weight:bold'>To:</span></b> Matt Breitbach;
ewg@lists.openfabrics.org<br>
<b><span style='font-weight:bold'>Subject:</span></b> Re: [ewg] Infiniband
Interoperability</span></font><o:p></o:p></p>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><font size=3
face="Times New Roman"><span style='font-size:12.0pt'>When I had multiple SM's
running none reported it as a problem.<br>
Sun developed their own for Solaris. I can't recall now what they called it.<br>
<br>
The other possibility i've seen cause problems with ipoib is having 2 ports on
the same IP subnet. Either bond them or disable ARP responses on one port. This
is due to the broadcast simulation across multicast.<br>
<br>
<br>
Richard<br>
<br>
----- Reply message -----<br>
From: "Matt Breitbach" <<st1:PersonName w:st="on">matthewb@flash.shanje.com</st1:PersonName>><br>
Date: Wed, Jun 30, 2010 19:32<br>
Subject: [ewg] Infiniband Interoperability<br>
To: <richard@informatix-sol.com>, <ewg@lists.openfabrics.org><br>
<br>
The Mellanox switch as far as I can tell does not have any SM running. It<br>
is a pretty dumb switch and there really isn't much to configure on it.<br>
<br>
<br>
<br>
LID 6 is the LID that OpenSM is running on - which is our CentOS 5.5 blade.<br>
I believe that it's reporting the issue since it's the Subnet Manager.<br>
<br>
<br>
<br>
The only other possibility is that there is a subnet manager running on the<br>
OpenSolaris box, but I have not been able to find one to blame this on. I<br>
would also think that in the OpenSM.log file I would find some reports of an<br>
election of some sort if there were multiple SM's running.<br>
<br>
<br>
<br>
LID listing : <br>
<br>
<br>
<br>
LID 1 - SuperMicro 4U running OpenSolaris (InfiniHost EX III PCI-E card w/<br>
128MB RAM)<br>
<br>
LID 2 - Blade Server currently running CentOS 5.5 and Xen (ConnectX<br>
Mezzanine card)<br>
<br>
LID 3 - InfiniScale III Switch<br>
<br>
LID 4 - SuperMicro 4U running OpenSolaris (InfiniHost EX III PCI-E card w/<br>
128MB RAM - 2nd port)<br>
<br>
LID 5 - Blade Server running Windows 2008R2 (ConnectX Mezzanine card)<br>
<br>
LID 6 - Blade Server running CentOS 5.5 and OpenSM (ConnectX Mezzanine card)<br>
<br>
LID 7 - Blade Server running Windows 2008 (InfiniHost EX III Mezzanine card)<br>
<br>
<br>
<br>
As for toggling the enable state - according to ibdiagnet the lowest<br>
connected rate member is at 20Gbps, but the network is only operating at<br>
10Gbps. I'm not sure which system I would toggle the enable state for.<br>
<br>
<br>
<br>
-Matt Breitbach<br>
<br>
_____ <br>
<br>
From: richard@informatix-sol.com [mailto:richard@informatix-sol.com] <br>
Sent: Wednesday, June 30, 2010 1:14 PM<br>
To: Matt Breitbach; ewg@lists.openfabrics.org<br>
Subject: Re: [ewg] Infiniband Interoperability<br>
<br>
<br>
<br>
I'm still suspicious that you have more than one SM running. Mellonex<br>
switches have it enabled by default.<br>
It's common that ARP requests, as caused by ping, will result in multicast<br>
group activity. <br>
Infiniband creates these on demand and tears them down if there are no<br>
current members. There is no broadcast address. It uses a dedicated MC<br>
group.<br>
They all seem to originate to LID 6 so you can trace the source.<br>
<br>
If you have ports at non optimal speeds, try toggling their enable state.<br>
This often fixes it. <br>
<br>
Richard<br>
<br>
----- Reply message -----<br>
From: "Matt Breitbach" <<st1:PersonName w:st="on">matthewb@flash.shanje.com</st1:PersonName>><br>
Date: Wed, Jun 30, 2010 15:33<br>
Subject: [ewg] Infiniband Interoperability<br>
To: <ewg@lists.openfabrics.org><br>
<br>
Well, let me throw out a little about the environment : <br>
<br>
<br>
<br>
We are running one SuperMicro 4U system with a Mellanox InfiniHost III EX<br>
card w/ 128MB RAM. This box is the OpenSolaris box. It's running
the<br>
OpenSolaris Infiniband stack, but no SM. Both ports are cabled to the IB<br>
Switch to ports 1 and 2.<br>
<br>
<br>
<br>
The other systems are in a SuperMicro Bladecenter. The switch in the<br>
BladeCenter is an InfiniScale III switch with 10 internal ports and 10<br>
external ports.<br>
<br>
<br>
<br>
3 blades are connected with Mellanox ConnectX Mezzanine cards. 1 blade is<br>
connected with an InfiniHost III EX Mezzanine card.<br>
<br>
<br>
<br>
One of the blades is running CentOS and the 1.5.1 OFED release. OpenSM is<br>
running on that system, and is the only SM running on the network. This<br>
blade is using a ConnectX Mezzanine card.<br>
<br>
<br>
<br>
One blade is running Windows 2008 with the latest OFED drivers installed.<br>
It is using an InfiniHost III EX Mezzanine card.<br>
<br>
<br>
<br>
One blade is running Windows 2008 R2 with the latest OFED drivers installed.<br>
It is using an ConnectX Mezzanine card.<br>
<br>
<br>
<br>
One blade has been switching between Windows 2008 R2 and CentOS with Xen.<br>
Windows 2008 is running the latest OFED drivers, CentOS is running the 1.5.2<br>
RC2. That blade is using a ConnectX Mezzanine card.<br>
<br>
<br>
<br>
All of the firmware has been updated on the Mezzanine cards, the PCI-E<br>
InfiniHost III EX card, and the switch. All of the Windows boxes are<br>
configured to use Connected mode. I have not changed any other settings
on<br>
the Linux boxes.<br>
<br>
<br>
<br>
As of right now, the network seems stable. I've been running pings for
the<br>
last 12 hours, and nothing has dropped.<br>
<br>
<br>
<br>
I did notice in the OpenSM log though some odd entries that I do not believe<br>
belong there. <br>
<br>
<br>
<br>
Jun 30 06:56:26 832438 [B5723B90] 0x02 -> log_notice: Reporting Generic<br>
Notice type:3 num:67 (Mcast group deleted) from LID:6<br>
GID:ff12:1405:ffff::3333:1:2<br>
<br>
Jun 30 06:57:53 895990 [B5723B90] 0x02 -> log_notice: Reporting Generic<br>
Notice type:3 num:66 (New mcast group created) from LID:6<br>
GID:ff12:1405:ffff::3333:1:2<br>
<br>
Jun 30 07:18:06 770861 [B6124B90] 0x02 -> log_notice: Reporting Generic<br>
Notice type:3 num:67 (Mcast group deleted) from LID:6<br>
GID:ff12:1405:ffff::3333:1:2<br>
<br>
Jun 30 07:19:14 835273 [B5723B90] 0x02 -> log_notice: Reporting Generic<br>
Notice type:3 num:66 (New mcast group created) from LID:6<br>
GID:ff12:1405:ffff::3333:1:2<br>
<br>
<br>
<br>
<br>
<br>
I would not think that new mcast groups should be created or deleted when<br>
there are no new adapters being added to the network, especially in this<br>
small of a network. Is it odd to see those messages?<br>
<br>
<br>
<br>
Also, I have a warning when I run ibdiagnet - "Suboptimal rate for group.<br>
Lowest member rate: 20Gbps > group-rat<o:p></o:p></span></font></p>
</div>
</body>
</html>