[ofa-general] IB interfaces occasionally go down & come up for no reason
Hal Rosenstock
hal.rosenstock at gmail.com
Thu Dec 18 08:37:45 PST 2008
Hi,
On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani
<Sumeet.Lahorani at oracle.com> wrote:
>
> Hi,
>
> We sometimes see our IB interfaces go down and come back up within 2 or 3
> seconds for apparently no reason.
That can occur without cable pulling, etc. when certain errors are
present on the link.
> Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
-11 is EAGAIN
> Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle
> interface ib0, disabling it in 5000 ms.
> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle
> interface ib1, disabling it in 5000 ms.
> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after
> 2000 ms for interface ib0.
> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after
> 2000 ms for interface ib1.
>
> To mask these we've set downdelay & updelay to 5000. But can anybody tell me
> why these interfaces could be bouncing down & up like this? We are not
> pulling any cables, resetting ports or resetting switches when this happens.
> We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418
> [ConnectX IB DDR] HCAs.
Which SM flavor ?
Would you dump out the port counters and see how they are change
before and after one of these "events" ?
-- Hal
> - Sumeet
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list