[openib-general] How about ib_send_page() ?
Grant Grundler
iod00d at hp.com
Tue May 17 19:08:16 PDT 2005
On Tue, May 17, 2005 at 06:32:38PM -0700, Jeff Carr wrote:
> >>>But IPoIB can't really implement NAPI since it's sending work to
> >>>a shared HCA.
>
> Hmm. I'm not knowledgeable to know why; I'll have to take your word for
> it. I'm not sure yet all the conditions that the HCA can generate
> interrupts.
Wellll..Looks like I'm wrong. Previous email on this thread
suggested it's possible by people who know alot more about
it than I do. But I'm still concerned it's going to affect
latency.
>
> But if I sit back and look at the logic of this arguement then it seemed
> like:
>
> Hey. is there was a way to not generate so many interrupts?
> That's handled by NAPI
> OK. That looks interesting.
Right - but that's the generic "this is how linux deals with this" argument.
> But, we can't do NAPI because we can't just disable interrupts.
Sorry - seems like I'm assuming too much about the capabilities
of the HCAs.
> Darn.
> But wait, why can't we just not generate interrupts in the first place then?
>
> Isn't that what the midas touch of netdev->poll() really is? e1000 has:
> quit_polling: netif_rx_complete(netdev);
> e1000_irq_enable(adapter);
I'm more familiar with tg3 driver. The disadvantage to NAPI in the
tg3 implementation is it *alays* disables interrupts on the card before
calling netif_rx_schedule(). Then it lets the OS decide when to
actually process those packets *already* received in a safer
context....then once tg3 decides it's done all the work
it re-eables the interrupts. Just like e1000 does above.
There are some workloads where the PCI bus utilization is "suboptimal"
because the enable/disable of interrupts interfers with the DMA flows
and costs excessive overhead.
> Maybe IB can mimic the concept here by acting intellegently for us?
> Have disable_rx_and_rxnobuff_ints() only disable interrupts for the
> IPoIB ULP? Anyway, my knowledge here still sucks so I'm probably so far
> off base I'm not even on the field. Either way it's fun digging around here.
Based on previous comments, I'm hoping that's the case.
But I don't know either.
> >One can. Using SDP, netperf TCP_STREAM measured 650 MB/s using the
> >regular PCI-X card.
>
> Yes, I have the same speed results using perf_main().
>
> The perf_main() test isn't that interesting I think though. It really
> just transfers the exact same memory window across two nodes. (at least
> as far as I can tell that is what it does).
>
> Anyway, I'm just noticing that this simple dd test from memory doesn't
> go much over 1GB/sec. So this is an interesting non-IB problem.
>
> root at jcarr:/# dd if=/dev/shm/test of=/dev/null bs=4K
> 196608+0 records in
> 196608+0 records out
> 805306368 bytes transferred in 0.628504 seconds (1281306571 bytes/sec)
Yeah. Sounds like there is. Should be able to do several GB/s like that.
I suppose it's possibly an issue with the Memory controller too.
ZX1 interleaves accesses across 4 DIMMs to get the memory bandwidth.
Might check to make sure your box is "optimally" configured too.
grundler at iota$ dd if=/dev/shm/test of=/dev/null bs=4K
dd: opening `/dev/shm/test': No such file or directory
Sorry - what do I need to do to create /dev/shm/test?
I should probably "cheat" and use 16KB block since that
is the native page size on ia64.
thanks,
grant
More information about the general
mailing list