[openib-general] Re: openib-general Digest, Vol 22, Issue 114

Bernard King-Smith wombat2 at us.ibm.com
Wed Apr 19 07:10:36 PDT 2006


    Shirley> After completion handler receives the notification, don't
    Shirley> poll the CQ right away, and wait for more WIKIs in
    Shirley> CQ. That way can reduce the CQ lock overhead.

    Roland> That's interesting... it makes sense, and it argues in
    Roland> favor of deferring CQ polling to a kernel thread.  Of
    Roland> course this will hurt ping-pong latency.  Maybe it's
    Roland> better to just implement NAPI though...

Roland> And actually it argues against splitting the CQ, because having one
CQ
Roland> increases the number of CQ entries that we have a chance to poll at
Roland> any one time, by lumping send and receive completions together...

The assumption you have here is that one CPU is capable of handling the
completions without impacting bandwidth. We have seen the opposite in that
we end up with one CPU pegged at high throughput. The benefit you are
working on is latency will be faster if we handle both send and receive
processing off the same thread/interrupt, but you have to balance that with
bandwidth limitations. You think 4X has a bandwdith problem using IPoIB,
wait till 12X comes out.

What per CPU utilization do you see on mthca on a multiple CPU machine
running peak bandwidth?

Roland>  - R.


Bernie King-Smith
IBM Corporation
Server Group
Cluster System Performance
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES

"We are not responsible for the world we are born into, only for the world
we leave when we die.
So we have to accept what has gone before us and work to change the only
thing we can,
-- The Future." William Shatner




More information about the general mailing list