[openib-general] Re: ib_sdp ERR: IOCB dmesg output

Michael S. Tsirkin mst at mellanox.co.il
Wed Jan 11 00:06:31 PST 2006


Quoting r. Grant Grundler <iod00d at hp.com>:
> Subject: Re: ib_sdp ERR: IOCB dmesg output
> 
> On Sun, Dec 11, 2005 at 09:53:41AM -0800, Grant Grundler wrote:
> ...
> > I might have spoken too soon...I just started getting "ERR" output
> > from ib_sdp running netperf TCP_STREAM over SDP on the IA64 rx2600's.
> > I killed and restarted the "sdpstream" script. It seems to be working.
> > 
> > I've not yet seen this type of error running r4344 on a different box.
> > If it's not obvious what's wrong, I can try r4344 on the rx2600's as well.
> ...
> > ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197>
> > ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384>
> > ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152>
> 
> I'm still seeing similar errors with 2.6.15 + svn 4800 and have another
> bit of data. Main problem is impact to performance:
> 	http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdpstream.png
> 
> I've parked the dmesg output here:
> 	http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdp-errors
> 
> After loading the drivers, iteratively running netperf to generate
> the data points (with LD_PRELOAD), I tried to unload all of IB modules
> but end up with:
> gsyprf3:~# lsmod
> Module                  Size  Used by
> ib_sdp                227136  9 
> ib_cm                  93964  1 ib_sdp
> ib_sa                  25324  1 ib_sdp
> ib_mad                 85952  2 ib_cm,ib_sa
> ib_core                93096  4 ib_sdp,ib_cm,ib_sa,ib_mad
> 
> I'm not sure who is holding the reference counts to ib_sdp.
> At this point no netperf processes are running. But some wq still
> have references (as root, "lsof | fgrep sdp"):
> sdp_wq/0  3893     root  cwd       DIR                8,3    4096          2 /
> sdp_wq/0  3893     root  rtd       DIR                8,3    4096          2 /
> sdp_wq/0  3893     root  txt   unknown                                       /proc/3893/exe
> sdp_wq/1  3894     root  cwd       DIR                8,3    4096          2 /
> sdp_wq/1  3894     root  rtd       DIR                8,3    4096          2 /
> sdp_wq/1  3894     root  txt   unknown                                       /proc/3894/exe
> 
> grundler at gsyprf3:~$ ps -ef | grep sdp
> root      3893    11  0 Jan08 ?        00:00:00 [sdp_wq/0]
> root      3894    11  0 Jan08 ?        00:00:00 [sdp_wq/1]
> 
> 
> It's likely the userspace openib libs are out of sync.
> But I don't expect that's relevant to SDP or IPoIB (kernel drivers).

No.

> This is in contrast to another box running identical kernel + modules:
> iowa:~# lsmod
> Module                  Size  Used by
> ib_uverbs              93096  0 
> ib_sdp                227136  0 
> ib_cm                  93964  1 ib_sdp
> ib_ipoib               95992  0 
> ib_sa                  25324  2 ib_sdp,ib_ipoib
> ib_mthca              275136  0 
> ib_mad                 85952  3 ib_cm,ib_sa,ib_mthca
> ib_core                93096  7 ib_uverbs,ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_mthca,ib_mad
> 
> "iota" was the target of netperf on gsyprf3 (ie iowa was running netserver
> with LD_PRELOAD as well).
> 
> Given the number of recent bug fixes since 4800, I will update and
> try again later this week.
> 
> thanks,
> grant
> 

Could you please try sdp patches from
https://openib.org/svn/trunk/contrib/mellanox/patches

-- 
MST



More information about the general mailing list