[ofa-general] IB interfaces occasionally go down & come up for no reason

Sumeet Lahorani Sumeet.Lahorani at oracle.com
Thu Dec 18 08:54:04 PST 2008


We are using the SM on the voltaire switch.

I could collect before & after snapshots of the port counters if I had a 
way of knowing when the event was about to happen. The problem is I 
don't. I guess we could run ibqueryerrors.pl every 5 seconds or so and 
correlate this event based on the timestamp.

Is there some tracing I could turn on to dump out the reason for the 
link bounce?

Do you have some examples of the errors that can lead to such a link bounce?

- Sumeet

Hal Rosenstock wrote:
> Hi,
>
> On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani
> <Sumeet.Lahorani at oracle.com> wrote:
>   
>> Hi,
>>
>> We sometimes see our IB interfaces go down and come back up within 2 or 3
>> seconds for apparently no reason.
>>     
>
> That can occur without cable pulling, etc. when certain errors are
> present on the link.
>
>   
>> Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
>>     
>
> -11 is EAGAIN
>
>   
>> Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
>> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle
>>  interface ib0, disabling it in 5000 ms.
>> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle
>>  interface ib1, disabling it in 5000 ms.
>> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after
>> 2000 ms for interface ib0.
>> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after
>> 2000 ms for interface ib1.
>>
>> To mask these we've set downdelay & updelay to 5000. But can anybody tell me
>> why these interfaces could be bouncing down & up like this? We are not
>> pulling any cables, resetting ports or resetting switches when this happens.
>> We are using Voltaire ISR9024  switches & Mellanox Technologies MT25418
>> [ConnectX IB DDR] HCAs.
>>     
>
> Which SM flavor ?
>
> Would you dump out the port counters and see how they are change
> before and after one of these "events" ?
>
> -- Hal
>
>   
>> - Sumeet
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>>     
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   




More information about the general mailing list