[ofa-general] IPoIB connections falling off

Hal Rosenstock hal.rosenstock at gmail.com
Wed Jul 22 15:05:56 PDT 2009


On Wed, Jul 22, 2009 at 5:59 PM, Ira Weiny<weiny2 at llnl.gov> wrote:
> Check your multicast group membership and forwarding tables on the switches.
>
> We have had similar issues and have found that some nodes fail to join the multicast groups for various reasons.

Also, such failures should be in the opensm log and at least give hint
of the issue (e.g. rate, MTU, etc.).

-- Hal

>
> Ira
>
> On Wed, 22 Jul 2009 15:55:42 -0600
> Todd Bowman <twbowman at gmail.com> wrote:
>
>> I need a little direction to help solve an IPoIB issue.
>> Software: OFED 1.3 and 1.4 stacks, running OpenSM
>>
>>
>> Problem:
>> IPoIB connections fail, meaning a node cannot ping all or some of the other
>> IPoIB nodes.  IB itself is still up, we can run IB tests with success.  So
>> far the only resolution is to restart the IB stack.  Size of the cluster
>> seems to be irrelevant.  It has happened on clusters from around 64 to
>> 1000s.
>>
>>
>> My first instinct is that some information has been lost from SM/SA which is
>> needed to create an IPoIB connection, but I'm not for sure what that
>> information is or how to verify that it is gone.
>>
>> Thanks in advance,
>>
>> Todd
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> weiny2 at llnl.gov
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list