[openib-general] Re: ib_sdp ERR: IOCB dmesg output

Grant Grundler iod00d at hp.com
Tue Jan 10 17:56:23 PST 2006


On Sun, Dec 11, 2005 at 09:53:41AM -0800, Grant Grundler wrote:
...
> I might have spoken too soon...I just started getting "ERR" output
> from ib_sdp running netperf TCP_STREAM over SDP on the IA64 rx2600's.
> I killed and restarted the "sdpstream" script. It seems to be working.
> 
> I've not yet seen this type of error running r4344 on a different box.
> If it's not obvious what's wrong, I can try r4344 on the rx2600's as well.
...
> ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197>
> ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384>
> ib_sdp  ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152>

I'm still seeing similar errors with 2.6.15 + svn 4800 and have another
bit of data. Main problem is impact to performance:
	http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdpstream.png

I've parked the dmesg output here:
	http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdp-errors

After loading the drivers, iteratively running netperf to generate
the data points (with LD_PRELOAD), I tried to unload all of IB modules
but end up with:
gsyprf3:~# lsmod
Module                  Size  Used by
ib_sdp                227136  9 
ib_cm                  93964  1 ib_sdp
ib_sa                  25324  1 ib_sdp
ib_mad                 85952  2 ib_cm,ib_sa
ib_core                93096  4 ib_sdp,ib_cm,ib_sa,ib_mad

I'm not sure who is holding the reference counts to ib_sdp.
At this point no netperf processes are running. But some wq still
have references (as root, "lsof | fgrep sdp"):
sdp_wq/0  3893     root  cwd       DIR                8,3    4096          2 /
sdp_wq/0  3893     root  rtd       DIR                8,3    4096          2 /
sdp_wq/0  3893     root  txt   unknown                                       /proc/3893/exe
sdp_wq/1  3894     root  cwd       DIR                8,3    4096          2 /
sdp_wq/1  3894     root  rtd       DIR                8,3    4096          2 /
sdp_wq/1  3894     root  txt   unknown                                       /proc/3894/exe

grundler at gsyprf3:~$ ps -ef | grep sdp
root      3893    11  0 Jan08 ?        00:00:00 [sdp_wq/0]
root      3894    11  0 Jan08 ?        00:00:00 [sdp_wq/1]


It's likely the userspace openib libs are out of sync.
But I don't expect that's relevant to SDP or IPoIB (kernel drivers).

This is in contrast to another box running identical kernel + modules:
iowa:~# lsmod
Module                  Size  Used by
ib_uverbs              93096  0 
ib_sdp                227136  0 
ib_cm                  93964  1 ib_sdp
ib_ipoib               95992  0 
ib_sa                  25324  2 ib_sdp,ib_ipoib
ib_mthca              275136  0 
ib_mad                 85952  3 ib_cm,ib_sa,ib_mthca
ib_core                93096  7 ib_uverbs,ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_mthca,ib_mad

"iota" was the target of netperf on gsyprf3 (ie iowa was running netserver
with LD_PRELOAD as well).

Given the number of recent bug fixes since 4800, I will update and
try again later this week.

thanks,
grant



More information about the general mailing list