[openib-general] How about ib_send_page() ?

Wed May 18 14:41:25 PDT 2005

On Wed, May 18, 2005 at 01:00:13PM -0700, Jeff Carr wrote:
> Grant Grundler wrote:
> >grundler at iota$ dd if=/dev/shm/test of=/dev/null bs=4K
> >dd: opening `/dev/shm/test': No such file or directory
> 
> I just did dd if=/dev/zero of=test bs=1M count=768

I realized later last night that /dev/shm is tmpfs and that I had
to create the file myself like you suggested above.

> I assume that the speed would be the same if there was non-zero data :)

Maybe. I don't know.

I've rerun this to get some idea of what is going on:
grundler at gsyprf3:~$ dd if=/dev/shm/ggg of=/dev/null bs=4k
255568+0 records in
255568+0 records out
1046806528 bytes transferred in 0.580582 seconds (1803029894 bytes/sec)
grundler at gsyprf3:~$ dd if=/dev/shm/ggg of=/dev/null bs=16k
63892+0 records in
63892+0 records out
1046806528 bytes transferred in 0.316682 seconds (3305546209 bytes/sec)
grundler at gsyprf3:~$ dd if=/dev/shm/ggg of=/dev/null bs=64k
15973+0 records in
15973+0 records out
1046806528 bytes transferred in 0.275389 seconds (3801192840 bytes/sec)

4K -> 1.8 GB/s
16k -> 3.3 GB/s
64k -> 3.8 GB/s

This seems reasonable.
IIRC the ZX1 chipset has 6GB/s backplane but one CPU can only drive ~4GB/s.

> I only have 2GB of ram.

I don't think that's important as long at you have enough available
to /dev/shm. More important is how memory is interleaved across
the DIMMs and how well the memcpy routines are optimized.
You need to physically verify DIMM placement and linux should
have highly optimized routines for most architectures today.

> >I should probably "cheat" and use 16KB block since that
> >is the native page size on ia64.
> 
> "cheating" in that sense is good. 16KB blocks will probably increase the 
> performance a lot. I was even thinking I would try to do that on my Xeon 
> system. (I don't know if that's possible.)

I don't see why not.
It ovbiously helps on the IA64 box.
We want to measure the copy speed, not the syscall speed, right? :^)

BTW, can you remind me again why this was important to rdma_lat test?
It was just to prove the VM/memcopy wasn't the bottleneck, right?

grant