[ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V3] patch for review

Thu May 3 11:14:56 PDT 2007

"Michael S. Tsirkin" <mst at dev.mellanox.co.il> wrote on 05/02/2007 08:55:47 
PM:

> > > > +      if (ipoib_cm_post_receive(dev, i << 32 | index)) {
> > > 
> > > 1. It seems there are multiple QPs mapped to a single CQ -
> > >    and each gets ipoib_recvq_size recv WRs above.
> > >    Is that right? How do you prevent CQ overrun then?
> > 
> > Good point! Looking at the IB spec it appears that upon CQ overflow
> > it results in a Local Work Queue catastrophic error and puts the QP
> > (receiver side) in an error state.
> 
> Look further in spec - you get CQ error, too.
> 
> > Hence, I am speculating that the 
> > sending side will see an error. This will result in the sending side 
> > destroying the QP and sending a DREQ message which, will remove the 
> > receive side QP.
> > 
> > A new set of QPs will be created on the send side (this is RC) and
> > the connection setup starts over again. It will continue, but at a
> > degraded rate.
> > Is this correct? What other alternative do you suggest
> > -create a CQ per QP? Is the max number of CQs an issue to consider, if 

> > we adopt this approach?
> 
> We were switching to NAPI though, and NAPI kind of forces you to use
> a common CQ, I think.

What if in ipoib_transport_dev_init() size is changed to something like:

size = ipoib_sendq_size + NOSRQ_INDEX_RING_SIZE * ipoib_recvq_size + 1;

used by ib_create_cq() call for the NOSRQ case only? Yes, we will end up
consuming a lot more memory -do you see any (other) problems with that?

Pradeep
pradeep at us.ibm.com