[openib-general] Re: netperf over SDP bug
Grant Grundler
iod00d at hp.com
Wed Sep 28 18:16:46 PDT 2005
On Wed, Sep 28, 2005 at 06:52:03AM +0300, Michael S. Tsirkin wrote:
...
> > grundler <505>grep -h ' 60\.0. ' sdp*.out | sort -n -k 3
BTW, I should also mention I'm running with HZ set at the new "default":
gsyprf3:/home/grundler/openib-perf-2005/r3547# fgrep HZ /boot/config-2.6.13
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
If there is a workqueue/task scheduling problem, it's likely
to be related to that setting.
> Interesting.
> Things to try would be
> - enable SDP debug, or event data path debug
Can you be more specific?
I have the CONFIG_SDP_DEBUG enabled and it's dumping setup/teardown info.
Did you want to see that?
> - try running with oprofile, see what is CPU doing
After looking at this, I'm not convinced this is a useful approach.
Ie we are only going to see "idle" loop alot...
I need to rebuild my kernel to enable CONFIG_PROFILING.
Being a lazy slob, I first tried qprof and here's what it said:
gsyprf3:~# LD_PRELOAD=/usr/local/lib/libsdp.so qprof /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_STREAM -T 1 -- -m 512 -s 16384 -S 16384
bind_to_specific_processor: sched_setaffinity
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.30 (10.0.0.30) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
32768 32768 512 60.01 375.49
qprof: /usr/local/bin/netperf: 285 samples, 285 counts
libc.so.6.1(send) 3 ( 1%)
netperf 1 ( 0%)
send_tcp_stream:nettest_bsd.c:1018 1 ( 0%)
send_tcp_stream:nettest_bsd.c:1449 2 ( 1%)
send_tcp_stream:nettest_bsd.c:1451 5 ( 2%)
[0xa000000000010620] 2 ( 1%)
[0xa000000000010621] 271 ( 95%)
Shared libs are at 0x2...
0xa... sounds like a kernel address:
a000000000010620 A __kernel_syscall_via_break
Conclusion - the qprof doesn't seem very helpful in this case.
q-syscollect seems more useful but spews a ton of files that
ideally need to be merged:
gsyprf3:~/.q# ls
apache2-pid1893-cpu1.edge#0 klogd-pid1577-cpu1.info#0
apache2-pid1893-cpu1.hist#0 master-pid1834-cpu1.edge#0
apache2-pid1893-cpu1.info#0 master-pid1834-cpu1.hist#0
bash-pid1906-cpu1.edge#0 master-pid1834-cpu1.info#0
bash-pid1906-cpu1.hist#0 migration_0-pid2-cpu0.edge#0
...
Care to see q-view any of those?
e.g.:
# q-view unknown-cpu1.info#0
Flat profile of CPU_CYCLES in unknown-cpu1.hist#0:
Each histogram sample counts as 1.00034m seconds
% time self cumul calls self/call tot/call name
99.97 59.70 59.70 642 93.0m 93.0m default_idle
0.02 0.01 59.71 430 27.9u 27.9u mdio_ctrl
0.00 0.00 59.71 55.7k 18.0n 18.0n __sched_text_end
0.00 0.00 59.71 51.0 19.6u 22.8u ip_local_deliver
0.00 0.00 59.72 894 1.12u 1.12u __copy_user
0.00 0.00 59.72 65.2k 15.3n 29.5n handle_IRQ_event
0.00 0.00 59.72 25.0 40.0u 40.0u copy_page
0.00 0.00 59.72 198 5.05u 12.2u tg3_poll
0.00 0.00 59.72 27.0 37.0u 37.0u proc_lookup
0.00 0.00 59.72 233k 0.00 0.00 __udivdi3
0.00 0.00 59.72 82.7k 0.00 0.00 pfm_interrupt_handler
0.00 0.00 59.72 66.2k 0.00 247n ia64_handle_irq
0.00 0.00 59.72 54.8k 0.00 0.00 __find_next_bit
0.00 0.00 59.72 50.7k 0.00 0.00 lsapic_noop
0.00 0.00 59.72 45.6k 0.00 42.4n __do_IRQ
0.00 0.00 59.72 43.7k 0.00 330n irq_exit
0.00 0.00 59.72 40.4k 0.00 0.00 sched_clock
0.00 0.00 59.72 19.6k 0.00 0.00 run_local_timers
0.00 0.00 59.72 5.38k 0.00 0.00 _spin_unlock
...
not more helpful either.
That's why I'm skeptical oprofile output would be better.
thanks,
grant
More information about the general
mailing list