[ewg] RE: Slow failover of IPoIB ipoibtools/bonding (bug 541)
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Sat Apr 21 23:02:16 PDT 2007
10-second port failover test has been running with IPoIB UD ipoibtools
HA for over 8 hours, and there have been very few slow failovers:
$ grep seconds screenlog.7 | wc -l
29705
$ grep seconds screenlog.7 | fgrep -v "over 1." | fgrep -v "over 2."
Interim result: 45.29 10^6bits/s over 53.21 seconds
Interim result: 299.37 10^6bits/s over 7.34 seconds
Interim result: 406.76 10^6bits/s over 5.84 seconds
Interim result: 614.00 10^6bits/s over 3.91 seconds
Interim result: 579.55 10^6bits/s over 4.06 seconds
Interim result: 239.60 10^6bits/s over 10.19 seconds
Scott
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il]
> Sent: Thursday, April 19, 2007 8:27 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: EWG; Roland Dreier (rdreier); Michael S. Tsirkin; Sean
> Hefty; openib
> Subject: Re: Slow failover of IPoIB ipoibtools/bonding (bug 541)
>
> > Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> > Subject: Slow failover of IPoIB ipoibtools/bonding (bug 541)
> >
> > Roland, Michael, or Sean, this is what I see when IPoIB
> failover is slow, how
> > do we get this fixed?
> >
> >
> > ib0: Request connection 0x60406 for gid
> fe80:0000:0000:0000:0002:c902:0020:e1d9
> > qpn 0x404
> > ib0: REP received.
> > ib0: REQ arrived
> > ib0: failed cm send event (status=12, wrid=45 vend_err 81)
> > ib0: Destroy active connection 0x60406 head 0x6546f tail 0x6546e
> > ib0: Request connection 0x70406 for gid
> fe80:0000:0000:0000:0002:c902:0020:e1d9
> > qpn 0x404
>
> Scott, this a result of port going down, the message is benign.
> For simplicity, could you please check whether slow failover
> is observed with
> datagram mode? This takes a couple of variables out of the equation.
>
> --
> MST
>
More information about the ewg
mailing list