[openib-general] Re: Speeding up IPoIB.

Hal Rosenstock halr at voltaire.com
Thu Apr 20 19:14:31 PDT 2006


On Thu, 2006-04-20 at 21:03, Bernard King-Smith wrote:
> Grant Grundler wrote:
> 
> > Currently we only get 40% of the link bandwidth compared to
> > 85% for 10 GigE. (Yes I know the cost differences which favor IB ).
> 
> Grant> 10gige is getting 85% without TOE?
> Grant> Or are they distributing event handling across several CPUs?
> 
> On 10 GigE they are using large send to the adapter where a 60K buffer is
> read by the adapter and fragmented into 1500 or 9000 byte Ethernet packets.
> Essentially they offload fragmentation to Ethernet packets from TCP to the
> adapter. This is similar to RC mode in IB fragmenting larger buffers into
> link 2000 byte frames/packets.
> 
> > However, two things hurt user level protocols. First is scaling and
> memory
> > requirements. Looking at parallel file systems on large clusters, SDP
> ended
> > up consuming so much memory it couldn't be used. The N by N socket
> > connections per node, using SDP the required buffer space and QP memory
> got
> > out of control. There is something to be said for sharing buffer and QP
> > space across lots of sockets.
> 
> Grant> My guess is it's an easier problem to fix SDP than reducing TCP/IP
> Grant> cache/CPU foot print. I realize only a subset of apps can (or will
> Grant> try to) use SDP because of setup/config issues.  I still believe SDP
> Grant> is useful to a majority of apps without having to recompile them.
> 
> I agree that reducing any protocol footprint is a very challenging job,
> however, going to a larger MTU drops the overhead much faster. If IB
> supported a 60K MTU then the TCP/IP overhead would be 1/30 that of what we
> measure today.

Then what you want is IPoIB-CM where the MTU can be much larger.

-- Hal

>  Traversng the TCP/IP stack once for a 60K packet is much
> lower than 30 times using 2000 byte packets for the same amount of data
> transmitted.
> 
> > The other issue is flow control across hundreds of autonomous sockets. In
> > TCP/IP, traffic can be managed so that there is some fairness
> > (multiplexing, QoS etc.) across all active sockets.  For user level
> > protocols like SDP and uDAPL, you can't manage traffic across multiple
> > autonomous user application connections because ther is no where to see
> all
> > of them at the same tiem for mangement. This can lead to overrunning
> > adapters or timeouts to the applications. This tends to be a large system
> > problem when you have lots of CPUs.
> 
> Grant> I'm not competent to disagree in detail.
> Grant> Fabian Tillier and Caitlin Bestler can (and have) addressed this.
> 
> I would be very interested in any pointers to their work.
> 
> > The footprint of IPoIB + TCP/IP is large as on any system, However, as
> you
> > get to higher CPU counts, the issue becomes less of a problem since more
> > unused CPU cycles are available. However, affinity ( CPU and Memory)  and
> > cacheline miss issues get greater.
> 
> Grant> Hrm...the concept of "unused CPU cycles" is bugging me as someone
> Grant> who occasionally gets to run benchmarks.  If a system today has
> Grant> unused CPU cycles, then will adding a faster link change the CPU
> Grant> load if the application doesn't change?
> 
> This goes back to systems where the system is busy doing nothing, generally
> when waiting for memory or a cache line miss, or I/O to disks. This is
> where hyperthreading has shown some speedups for benchmarks where
> previously they were totally CPU limited, and with hyperthreading there is
> a gain. The unused cycles are "wait" cycles when something can run if it
> can get in quickly. You can't get a TCP stack in the wait, but small parts
> of the stackor driver could fit in the other thread. Yes I do benchmarking
> and was skeptical at first.
> 
> Grant> thanks,
> Grant> grant
> 
> Bernie King-Smith
> IBM Corporation
> Server Group
> Cluster System Performance
> wombat2 at us.ibm.com    (845)433-8483
> Tie. 293-8483 or wombat2 on NOTES
> 
> "We are not responsible for the world we are born into, only for the world
> we leave when we die.
> So we have to accept what has gone before us and work to change the only
> thing we can,
> -- The Future." William Shatner
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list