[openib-general] Re: ib_sdp ERR: IOCB dmesg output
Michael S. Tsirkin
mst at mellanox.co.il
Wed Jan 11 00:06:31 PST 2006
Quoting r. Grant Grundler <iod00d at hp.com>:
> Subject: Re: ib_sdp ERR: IOCB dmesg output
>
> On Sun, Dec 11, 2005 at 09:53:41AM -0800, Grant Grundler wrote:
> ...
> > I might have spoken too soon...I just started getting "ERR" output
> > from ib_sdp running netperf TCP_STREAM over SDP on the IA64 rx2600's.
> > I killed and restarted the "sdpstream" script. It seems to be working.
> >
> > I've not yet seen this type of error running r4344 on a different box.
> > If it's not obvious what's wrong, I can try r4344 on the rx2600's as well.
> ...
> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197>
> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384>
> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152>
>
> I'm still seeing similar errors with 2.6.15 + svn 4800 and have another
> bit of data. Main problem is impact to performance:
> http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdpstream.png
>
> I've parked the dmesg output here:
> http://gsyprf3.external.hp.com/openib/rx2600-r4800/sdp-errors
>
> After loading the drivers, iteratively running netperf to generate
> the data points (with LD_PRELOAD), I tried to unload all of IB modules
> but end up with:
> gsyprf3:~# lsmod
> Module Size Used by
> ib_sdp 227136 9
> ib_cm 93964 1 ib_sdp
> ib_sa 25324 1 ib_sdp
> ib_mad 85952 2 ib_cm,ib_sa
> ib_core 93096 4 ib_sdp,ib_cm,ib_sa,ib_mad
>
> I'm not sure who is holding the reference counts to ib_sdp.
> At this point no netperf processes are running. But some wq still
> have references (as root, "lsof | fgrep sdp"):
> sdp_wq/0 3893 root cwd DIR 8,3 4096 2 /
> sdp_wq/0 3893 root rtd DIR 8,3 4096 2 /
> sdp_wq/0 3893 root txt unknown /proc/3893/exe
> sdp_wq/1 3894 root cwd DIR 8,3 4096 2 /
> sdp_wq/1 3894 root rtd DIR 8,3 4096 2 /
> sdp_wq/1 3894 root txt unknown /proc/3894/exe
>
> grundler at gsyprf3:~$ ps -ef | grep sdp
> root 3893 11 0 Jan08 ? 00:00:00 [sdp_wq/0]
> root 3894 11 0 Jan08 ? 00:00:00 [sdp_wq/1]
>
>
> It's likely the userspace openib libs are out of sync.
> But I don't expect that's relevant to SDP or IPoIB (kernel drivers).
No.
> This is in contrast to another box running identical kernel + modules:
> iowa:~# lsmod
> Module Size Used by
> ib_uverbs 93096 0
> ib_sdp 227136 0
> ib_cm 93964 1 ib_sdp
> ib_ipoib 95992 0
> ib_sa 25324 2 ib_sdp,ib_ipoib
> ib_mthca 275136 0
> ib_mad 85952 3 ib_cm,ib_sa,ib_mthca
> ib_core 93096 7 ib_uverbs,ib_sdp,ib_cm,ib_ipoib,ib_sa,ib_mthca,ib_mad
>
> "iota" was the target of netperf on gsyprf3 (ie iowa was running netserver
> with LD_PRELOAD as well).
>
> Given the number of recent bug fixes since 4800, I will update and
> try again later this week.
>
> thanks,
> grant
>
Could you please try sdp patches from
https://openib.org/svn/trunk/contrib/mellanox/patches
--
MST
More information about the general
mailing list