[ofa-general] error with ibv_poll_cq() call
Jack Morgenstein
jackm at dev.mellanox.co.il
Wed Mar 26 00:54:39 PDT 2008
On Wednesday 26 March 2008 09:41, Tang, Changqing wrote:
>
> > -----Original Message-----
> > From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> > Sent: Wednesday, March 26, 2008 2:30 AM
> > To: Tang, Changqing
> > Cc: general at lists.openfabrics.org; Roland Dreier
> > Subject: Re: [ofa-general] error with ibv_poll_cq() call
> >
> > On Wednesday 26 March 2008 09:09, Tang, Changqing wrote:
> > > Well, I said "PEER" process may destroyed the remote QP,
> > The process
> > > calling ibv_poll_cq() still has the QP in RTS state.
> > >
> > > And though I use OFED 1.3, the HCA is not connectX.
> > >
> > > idea ?
> > >
1. You are getting a "non-existent QP" error for the poll:
The ONLY way this can happen on the send is if you have
destroyed some QP which is using that CQ for send completions,
and the QP generated a SEND completion during the
destroy (-- after cleaning the cq from completions for that QP,
but before calling ibv_cmd_destroy_qp() to move the QP to the
error state, allowing a new, "uncleaned" CQE to be queued).
2. If you always modify the QP to ERR before destroying it, the above
race condition (noted in my previous post) does not apply.
** (ROLAND, libmthca has the same race condition) **.
- Jack
> > 1. What HCA are you using?
>
> hca_id: mthca0
> fw_ver: 1.1.0
> node_guid: 0019:bbff:fff8:c048
> sys_image_guid: 0019:bbff:fff8:c04b
> vendor_id: 0x02c9
> vendor_part_id: 25204
> hw_ver: 0xA0
> board_id: HP_0010000001
> phys_port_cnt: 1
>
> > 2. Are there other QPs using the CQ?
>
> yes, there are connections to other processes. The connection to a particular peer process is to be disconnected.
> the peer process may have passed the handshake and destroy the QP
>
> Is the CQ used for sends only?
>
> Yes, it is for sends only, we use pure RDMA.
>
>
> > How are you using the CQ?
>
> We use selective signal for sends completion, but the last one is signaled.
>
>
> --CQ
>
>
>
> >
> > - Jack
> >
> >
>
More information about the general
mailing list