<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Here's the output smpquery portinfo -D 0 as requested below:<br>
<br>
[root@h2o01 ~]# smpquery portinfo -D 0<br>
# Port info: DR path 0 port 0<br>
Mkey:............................0x0000000000000000<br>
GidPrefix:.......................0xfe80000000000000<br>
Lid:.............................0x0003<br>
SMLid:...........................0x0001<br>
CapMask:.........................0x2510a68<br>
                IsTrapSupported<br>
                IsAutomaticMigrationSupported<br>
                IsSLMappingSupported<br>
                IsLedInfoSupported<br>
                IsSystemImageGUIDsupported<br>
                IsCommunicatonManagementSupported<br>
                IsVendorClassSupported<br>
                IsCapabilityMaskNoticeSupported<br>
                IsClientRegistrationSupported<br>
DiagCode:........................0x0000<br>
MkeyLeasePeriod:.................0<br>
LocalPort:.......................1<br>
LinkWidthEnabled:................1X or 4X<br>
LinkWidthSupported:..............1X or 4X<br>
LinkWidthActive:.................4X<br>
LinkSpeedSupported:..............2.5 Gbps<br>
LinkState:.......................Active<br>
PhysLinkState:...................LinkUp<br>
LinkDownDefState:................Polling<br>
ProtectBits:.....................0<br>
LMC:.............................0<br>
LinkSpeedActive:.................2.5 Gbps<br>
LinkSpeedEnabled:................2.5 Gbps<br>
NeighborMTU:.....................2048<br>
SMSL:............................0<br>
VLCap:...........................VL0-3<br>
InitType:........................0x00<br>
VLHighLimit:.....................0<br>
VLArbHighCap:....................8<br>
VLArbLowCap:.....................8<br>
InitReply:.......................0x00<br>
MtuCap:..........................2048<br>
VLStallCount:....................7<br>
HoqLife:.........................0<br>
OperVLs:.........................VL0-3<br>
PartEnforceInb:..................0<br>
PartEnforceOutb:.................0<br>
FilterRawInb:....................0<br>
FilterRawOutb:...................0<br>
MkeyViolations:..................0<br>
PkeyViolations:..................0<br>
QkeyViolations:..................0<br>
GuidCap:.........................32<br>
ClientReregister:................0<br>
SubnetTimeout:...................17<br>
RespTimeVal:.....................16<br>
LocalPhysErr:....................15<br>
OverrunErr:......................15<br>
MaxCreditHint:...................0<br>
RoundTrip:.......................0<br>
<br>
<br>
I did some checking, and It's not just this node having problems, all
nodes seem to be having this same problem.<br>
<br>
jeff<br>
<br>
<br>
Hal Rosenstock wrote:
<blockquote
 cite="mid:f0e08f230903171440i51d22b0as888fed2aa2a1f91e@mail.gmail.com"
 type="cite">
  <pre wrap="">2009/3/17 jeffrey Lang <a class="moz-txt-link-rfc2396E" href="mailto:jrlang@uwyo.edu"><jrlang@uwyo.edu></a>:
  </pre>
  <blockquote type="cite">
    <pre wrap="">First let me say, I hope this is the right list for this email, if not
please forgive me.

I have a small 16 node compute cluster.    The university where I work at
recently opened a new Datacenter.  My cluster was moved from the old
Datacenter.   Before the move the inifiniband was working properly, after
the move the ipoib has stopped working.

The cluster runs Centos 4 with all the latest updates and the Centos
distributed OFED code.   My plan was to update the OFED code once things had
restablized.

For the move, I shutdown the cluster, removed the inifiniband cables and the
cluster was moved.   I then reinstalled the infiniband cables (not in the
same order before the move) and brought every thing back up.

When i brought the cluster back up the ipoib would not work.  The only
message in the log file is "Mar 15 04:04:32 h2o01 kernel: ib0: multicast
join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22".
    </pre>
  </blockquote>
  <pre wrap=""><!---->
I think that there may be a rate issue in terms of this node relative
to the IPoIB broadcast group which by default is 10 Gbps (4x SDR).
What does this node's portinfo show (smpquery portinfo -D 0) in terms
of link width and speed ?

-- Hal

  </pre>
  <blockquote type="cite">
    <pre wrap="">The master node can see all the systems:

[root@h2o01 log]# ibnodes
Ca    : 0x00066a0098007e99 ports 1 "h2o17 HCA-1"
Ca    : 0x00066a0098007e9b ports 1 "h2o18 HCA-1"
Ca    : 0x00066a0098007e97 ports 1 "h2o16 HCA-1"
Ca    : 0x00066a0098007e8c ports 1 "h2o15 HCA-1"
Ca    : 0x00066a0098007e94 ports 1 "h2o14 HCA-1"
Ca    : 0x00066a0098007e93 ports 1 "h2o13 HCA-1"
Ca    : 0x00066a0098007e8e ports 1 "h2o12 HCA-1"
Ca    : 0x00066a0098007e90 ports 1 "h2o11 HCA-1"
Ca    : 0x00066a0098007e98 ports 1 "h2o10 HCA-1"
Ca    : 0x00066a0098007e95 ports 1 "h2o09 HCA-1"
Ca    : 0x00066a0098007e8f ports 1 "h2o08 HCA-1"
Ca    : 0x00066a0098007e92 ports 1 "h2o07 HCA-1"
Ca    : 0x00066a0098007e8d ports 1 "h2o06 HCA-1"
Ca    : 0x00066a0098007e91 ports 1 "h2o05 HCA-1"
Ca    : 0x00066a0098007e96 ports 1 "h2ocfs HCA-1"
Ca    : 0x00066a0098007e9c ports 1 "h2o01 HCA-1"
Switch    : 0x00066a00d8000593 ports 24 "SilverStorm 9024
GUID=0x00066a00d8000593" enhanced port 0 lid 1 lmc 0

I've reset the sm on the switch, but nothing seems to work.

Any ideas of where to look for whats causing the problem?

jeff

_______________________________________________
general mailing list
<a class="moz-txt-link-abbreviated" href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</a>

To unsubscribe, please visit
<a class="moz-txt-link-freetext" href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a>

    </pre>
  </blockquote>
</blockquote>
</body>
</html>