[ewg] RE: Slow failover of IPoIB ipoibtools/bonding (bug 541)

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Sat Apr 21 23:02:16 PDT 2007


10-second port failover test has been running with IPoIB UD ipoibtools
HA for over 8 hours, and there have been very few slow failovers:

$ grep seconds screenlog.7 | wc -l
29705

$ grep seconds screenlog.7 | fgrep -v "over 1." | fgrep -v "over 2."
Interim result:   45.29 10^6bits/s over 53.21 seconds
Interim result:  299.37 10^6bits/s over 7.34 seconds
Interim result:  406.76 10^6bits/s over 5.84 seconds
Interim result:  614.00 10^6bits/s over 3.91 seconds
Interim result:  579.55 10^6bits/s over 4.06 seconds
Interim result:  239.60 10^6bits/s over 10.19 seconds

Scott 

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] 
> Sent: Thursday, April 19, 2007 8:27 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: EWG; Roland Dreier (rdreier); Michael S. Tsirkin; Sean 
> Hefty; openib
> Subject: Re: Slow failover of IPoIB ipoibtools/bonding (bug 541)
> 
> > Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> > Subject: Slow failover of IPoIB ipoibtools/bonding (bug 541)
> > 
> > Roland, Michael, or Sean, this is what I see when IPoIB 
> failover is slow, how
> > do we get this fixed?
> >  
> >  
> > ib0: Request connection 0x60406 for gid 
> fe80:0000:0000:0000:0002:c902:0020:e1d9
> > qpn 0x404
> > ib0: REP received.
> > ib0: REQ arrived
> > ib0: failed cm send event (status=12, wrid=45 vend_err 81)
> > ib0: Destroy active connection 0x60406 head 0x6546f tail 0x6546e
> > ib0: Request connection 0x70406 for gid 
> fe80:0000:0000:0000:0002:c902:0020:e1d9
> > qpn 0x404
> 
> Scott, this a result of port going down, the message is benign.
> For simplicity, could you please check whether slow failover 
> is observed with
> datagram mode? This takes a couple of variables out of the equation.
> 
> -- 
> MST
> 



More information about the ewg mailing list