[ofa-general] Re: [RFH] IPoIB retransmission when sending multiple WR's to device

Krishna Kumar2 krkumar2 at in.ibm.com
Thu Aug 2 20:48:29 PDT 2007


Hi Roland,

I did one more test to check the out-of-order theory. I changed my new API
to be:

/* Original code, unmodified */
ipoib_start_xmit()
{
      original code
}

/* Added new xmit which is identical to original code but doesn't get the
lock */
ipoib_start_xmit_nolock()
{
      original code but without getting lock
}

/* Batching API */
ipoib_start_xmit_batch()
{
      get_lock()
      while (skbs in queue) {
            ret = ipoib_start_xmit_nolock()
      }
      unlock()
}

This in effect is fast sends of multiple skbs while holding the lock once
for all the skbs
without going back to the IP stack. The only difference is that this
creates one WR per
skb instead of multiple WR's. I got identical retranmissions (around 240
for the full run)
as compared to original code (225), while the multiple WR's code had
>100,000
retransmissions.

I agree that sending multiple WR's is still faster than this, but this is
second best I could
do, and still there is no increase in retransmissions. The tx_queue_len on
ib0 is 8192,
recv/send size are both 4K. The receiver shows no errors/drops in ifconfig
nor netstat -s.

Is there anything that can be concluded/suspected, or something else I
could try ?

The latest run of the batching API gave me 719,679 retransmissions for a 16
min test run
of 16/64 threads (iperf), which comes to 600 retransmissions per sec, which
is more than
the retransmissions during the entire run for the regular code!

thanks,

- KK

__________________

Hi Roland,

Roland Dreier <rdreier at cisco.com> wrote on 08/02/2007 09:59:23 PM:

>  > On the same topic that I wrote about earlier, I put debugs
>  > in my code to store all skbs in bufferA when enqueing multiple
>  > skbs, and store all skbs to bufferB just before doing post.
>  > During post, I compare the two buffers to make sure that I am
>  > not posting in the wrong order, and that never happens.
>  >
>  > But I am getting a huge amount of retransmissions anyway,
>
> Why do you think the retransmissions are related to things being sent
> out of order?  Is it possible you're just sending much faster and
> overrunning the receiver's queue of posted receives?

I cannot be sure of that. But in regular code too, batching is done
in qdisc_run() in a different sense - it sends out packets *iteratively*.
In this case, I see only 225 retransmission for the entire run of all
tests (while in my code, I see 100,000 or more (I think I gave wrong
number earlier, this is the right one - 200 vs 100,000).

Is there any way to avoid the situation you are talking about ? I am
already setting recv_queue_size=4096 when loading ipoib (so for mthca
too).

Thanks,

- KK




More information about the general mailing list