[ofa-general] error with ibv_poll_cq() call

Jack Morgenstein jackm at dev.mellanox.co.il
Wed Mar 26 00:54:39 PDT 2008


On Wednesday 26 March 2008 09:41, Tang, Changqing wrote:
> 
> > -----Original Message-----
> > From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> > Sent: Wednesday, March 26, 2008 2:30 AM
> > To: Tang, Changqing
> > Cc: general at lists.openfabrics.org; Roland Dreier
> > Subject: Re: [ofa-general] error with ibv_poll_cq() call
> >
> > On Wednesday 26 March 2008 09:09, Tang, Changqing wrote:
> > > Well, I said "PEER" process may destroyed the remote QP,
> > The process
> > > calling ibv_poll_cq() still has the QP in RTS state.
> > >
> > > And though I use OFED 1.3, the HCA is not connectX.
> > >
> > > idea ?
> > >
1. You are getting a "non-existent QP" error for the poll:

   The ONLY way this can happen on the send is if you have
   destroyed some QP which is using that CQ for send completions,
   and the QP generated a SEND completion during the
   destroy (-- after cleaning the cq from completions for that QP,
   but before calling ibv_cmd_destroy_qp() to move the QP to the
   error state, allowing a new, "uncleaned" CQE to be queued).

2. If you always modify the QP to ERR before destroying it, the above
   race condition (noted in my previous post) does not apply.

   ** (ROLAND, libmthca has the same race condition) **.

- Jack
> > 1. What HCA are you using?
> 
> hca_id: mthca0
>         fw_ver:                         1.1.0
>         node_guid:                      0019:bbff:fff8:c048
>         sys_image_guid:                 0019:bbff:fff8:c04b
>         vendor_id:                      0x02c9
>         vendor_part_id:                 25204
>         hw_ver:                         0xA0
>         board_id:                       HP_0010000001
>         phys_port_cnt:                  1
> 
> > 2. Are there other QPs using the CQ?
> 
> yes, there are connections to other processes. The connection to a particular peer process is to be disconnected.
> the peer process may have passed the handshake and destroy the QP
> 
> Is the CQ used for sends only?
> 
> Yes, it is for sends only, we use pure RDMA.
> 
> 
> >    How are you using the CQ?
> 
> We use selective signal for sends completion, but the last one is signaled.
> 
> 
> --CQ
> 
> 
> 
> >
> > - Jack
> >
> >
> 



More information about the general mailing list