[ofa-general] Re: [PATCH 10/10 REV5] [E1000] Implement batching

Krishna Kumar2 krkumar2 at in.ibm.com
Sun Sep 16 20:56:47 PDT 2007


Hi Evgeniy,

Evgeniy Polyakov <johnpol at 2ka.mipt.ru> wrote on 09/14/2007 06:17:14 PM:

> >     if (unlikely(skb->len <= 0)) {
> >        dev_kfree_skb_any(skb);
> > -      return NETDEV_TX_OK;
> > +      return NETDEV_TX_DROPPED;
> >     }
>
> This changes could actually go as own patch, although not sure it is
> ever used. just a though, not a stopper.

Since this flag is new and useful only for batching, I feel it is OK to
include it in this patch.

> > +   if (!skb || (blist && skb_queue_len(blist))) {
> > +      /*
> > +       * Either batching xmit call, or single skb case but there are
> > +       * skbs already in the batch list from previous failure to
> > +       * xmit - send the earlier skbs first to avoid out of order.
> > +       */
> > +      if (skb)
> > +         __skb_queue_tail(blist, skb);
> > +      skb = __skb_dequeue(blist);
>
> Why is it put at the end?

There is a bug that I had explained in rev4 (see XXX below) resulting
in sending out skbs out of order. The fix is that if the driver gets
called with a skb but there are older skbs already in the batch list
(which failed to get sent out), send those skbs first before this one.

Thanks,

- KK

[XXX] Dave had suggested to use batching only in the net_tx_action case.
When I implemented that in earlier revisions, there were lots of TCP
retransmissions (about 18,000 to every 1 in regular code). I found the
reason
for part of that problem as: skbs get queue'd up in dev->qdisc (when tx
lock
was not got or queue blocked); when net_tx_action is called later, it
passes
the batch list as argument to qdisc_run and this results in skbs being
moved
to the batch list; then batching xmit also fails due to tx lock failure;
the
next many regular xmit of a single skb will go through the fast path (pass
NULL batch list to qdisc_run) and send those skbs out to the device while
previous skbs are cooling their heels in the batch list.

The first fix was to not pass NULL/batch-list to qdisc_run() but to always
check whether skbs are present in batch list when trying to xmit. This
reduced
retransmissions by a third (from 18,000 to around 12,000), but led to
another
problem while testing - iperf transmit almost zero data for higher # of
parallel flows like 64 or more (and when I run iperf for a 2 min run, it
takes about 5-6 mins, and reports that it ran 0 secs and the amount of data
transfered is a few MB's). I don't know why this happens with this being
the
only change (any ideas is very appreciated).

The second fix that resolved this was to revert back to Dave's suggestion
to
use batching only in net_tx_action case, and modify the driver to see if
skbs
are present in batch list and to send them out first before sending the
current skb. I still see huge retransmission for IPoIB (but not for E1000),
though it has reduced to 12,000 from the earlier 18,000 number.





More information about the general mailing list