[openib-general] Re: ib_sdp ERR: IOCB dmesg output
Grant Grundler
iod00d at hp.com
Tue Jan 10 17:56:23 PST 2006
On Sun, Dec 11, 2005 at 09:53:41AM -0800, Grant Grundler wrote:
...
> I might have spoken too soon...I just started getting "ERR" output
> from ib_sdp running netperf TCP_STREAM over SDP on the IA64 rx2600's.
> I killed and restarted the "sdpstream" script. It seems to be working.
>
> I've not yet seen this type of error running r4344 on a different box.
> If it's not obvious what's wrong, I can try r4344 on the rx2600's as well.
...
> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197>
> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384>
> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152>
I'm still seeing similar errors with 2.6.15 + svn 4800 and have another
bit of data. Main problem is impact to performance:
http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdpstream.png
I've parked the dmesg output here:
http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdp-errors
After loading the drivers, iteratively running netperf to generate
the data points (with LD_PRELOAD), I tried to unload all of IB modules
but end up with:
gsyprf3:~# lsmod
Module Size Used by
ib_sdp 227136 9
ib_cm 93964 1 ib_sdp
ib_sa 25324 1 ib_sdp
ib_mad 85952 2 ib_cm,ib_sa
ib_core 93096 4 ib_sdp,ib_cm,ib_sa,ib_mad
I'm not sure who is holding the reference counts to ib_sdp.
At this point no netperf processes are running. But some wq still
have references (as root, "lsof | fgrep sdp"):
sdp_wq/0 3893 root cwd DIR 8,3 4096 2 /
sdp_wq/0 3893 root rtd DIR 8,3 4096 2 /
sdp_wq/0 3893 root txt unknown /proc/3893/exe
sdp_wq/1 3894 root cwd DIR 8,3 4096 2 /
sdp_wq/1 3894 root rtd DIR 8,3 4096 2 /
sdp_wq/1 3894 root txt unknown /proc/3894/exe
grundler at gsyprf3:~$ ps -ef | grep sdp
root 3893 11 0 Jan08 ? 00:00:00 [sdp_wq/0]
root 3894 11 0 Jan08 ? 00:00:00 [sdp_wq/1]
It's likely the userspace openib libs are out of sync.
But I don't expect that's relevant to SDP or IPoIB (kernel drivers).
This is in contrast to another box running identical kernel + modules:
iowa:~# lsmod
Module Size Used by
ib_uverbs 93096 0
ib_sdp 227136 0
ib_cm 93964 1 ib_sdp
ib_ipoib 95992 0
ib_sa 25324 2 ib_sdp,ib_ipoib
ib_mthca 275136 0
ib_mad 85952 3 ib_cm,ib_sa,ib_mthca
ib_core 93096 7 ib_uverbs,ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_mthca,ib_mad
"iota" was the target of netperf on gsyprf3 (ie iowa was running netserver
with LD_PRELOAD as well).
Given the number of recent bug fixes since 4800, I will update and
try again later this week.
thanks,
grant
More information about the general
mailing list