[openib-general] ia64 perf and FMR
Grant Grundler
iod00d at hp.com
Fri Apr 1 18:40:48 PST 2005
Hi,
Just wanted to share initial perf results (and surprise)
that I'm getting on the HP ZX1/IA64 boxes.
Before FMR support was committed, netperf was reporting around
1720 Mb/s (215 MB/s) for IPoIB with msi_x=1 and netserver pinned
to the CPU that wasn't taking interrupts. After FMR was committed,
netperf is reporting about 3500 Mb/s (437 MB/s) for IPoIB. CPU was
saturated on the send side in all cases.
I've a vague idea what "Fast Memory Registration" is but not a good
understanding. Can someone point me at a decent explanation of FMR?
I'd like to understand the 2X in performance.
Maybe we are doing 1/2 as much DMA mapping in one of
the bug fixes?
And I'm suspicious of the IPoIB numbers since SDP is also seeing
a bit over 3500 Mb/s and sending CPU is also saturated. I was hoping
SDP would be 40-60% faster than TCP (ipoib). Maybe I'm just not
configuring libsdp.conf correctly for netperf and maybe the IPoIB
numbers are correct. I've "rmmod ib_sdp" on both boxes, unloaded
and reloaded all the other IB drivers, and "unset LD_PRELOAD".
Is unloading ib_sdp sufficient to be sure SDP isn't used?
(I do get "module in use" when netserver is running with LD_PRELOAD
pointing at libsdp.so)
I also reviewed all the "__attribute__ ((packed))" uses in
include/ib_mad.h and include/ib_smi.h. It looks safe to me
to remove them since every field is "naturally" aligned from
the start of it's respective structure. I also checked
nested cases. However, while it worked fine, removing all use
from the two files didn't matter for netperf TCP_STREAM.
I didn't realize other files also use "packed" and will
have to revisit the issue. I'm mostly worried some
new use will not be well aligned and cause the compiler
to insert padding. That will be a PITA to debug.
What we need is a compiler warning to tell us when/where
padding is inserted in a structure with a similar __attribute__.
Reminder: not pinning the netserver thread to the other CPU
costs around 25% performance. I think that's true for any single
threaded networking perf test that saturates the CPU.
thanks,
grant
More information about the general
mailing list