[ofa-general] Re: [PATCH 11/12 -Rev2] IPoIB xmit API addition

Sun Jul 22 19:53:25 PDT 2007

Hi Micheal,

"Michael S. Tsirkin" <mst at dev.mellanox.co.il> wrote on 07/22/2007 03:11:36
PM:

> > +   /*
> > +    * Handle skbs completion from tx_tail to wr_id. It is possible to
> > +    * handle WC's from earlier post_sends (possible multiple) in this
> > +    * iteration as we move from tx_tail to wr_id, since if the last
> > +    * WR (which is the one which had a completion request) failed to
be
> > +    * sent for any of those earlier request(s), no completion
> > +    * notification is generated for successful WR's of those earlier
> > +    * request(s).
> > +    */
>
> AFAIK a signalled WR will always generate a completion.
> What am I missing?

Yes, signalled WR will generate a completion. I am trying to catch the case
where, say, I send 64 skbs and set signalling for only the last skb and the
others are set to NO signalling. Now if the driver found the last WR was
bad
for some reason, it will synchronously fail the send for that WR (which
happens to be the only one that is signalled). So after the 1 to 63 skbs
are
finished, there will be no completion called. That was my understanding of
how
this works, and coded it that way so that the next post will clean up the
previous one's completion.

> >
> > +         /*
> > +          * Better error handling can be done here, like free
> > +          * all untried skbs if err == -ENOMEM. However at this
> > +          * time, we re-try all the skbs, all of which will
> > +          * likely fail anyway (unless device finished sending
> > +          * some out in the meantime). This is not a regression
> > +          * since the earlier code is not doing this either.
> > +          */
>
> Are you retrying posting skbs? Why is this a good idea?
> AFAIK, earlier code did not retry posting WRs at all.

Not exactly. If I send 64 skbs to the device and the provider returned a
bad WR at skb # 50, then I will have to try skb# 51-64 again since the
provider has not attemped to send those out as it bails out at the first
failure. The provider ofcourse has already sent out skb# 1-49 before
returning failure at skb# 50. So it is not strictly retry, just xmit of
next skbs which is what the current code also does. I tested this part out
by simulating errors in mthca_post_send and verified that the next
iteration clears up the remaining skbs.

> The comment seems to imply that post send fails as a result of SQ
overflow -

Correct.

> do you see SQ overflow errors in your testing?

No.

> AFAIK, IPoIB should never overflow the SQ.

Correct. It should never happen unless IPoIB has a bug :) I guess the
comment
should be removed ?

Thanks,

- KK