[ofa-general] poll CQ failed -2 with connectX

Eli Cohen eli at dev.mellanox.co.il
Tue Oct 28 11:49:24 PDT 2008


On Mon, Oct 27, 2008 at 06:38:48PM -0400, Rick Warner wrote:
> Hi all,
> 
> I am configuring an opteron cluster with connectX Infiniband.  I have a 
> problem that if I run one of the NAS tests, it works the first, and maybe 2nd 
> time, but after that the jobs instantly fail with messages like this-
> 
> [Rank 44][cm.c: line 860]poll CQ failed -2
> [Rank 51][cm.c: line 860]poll CQ failed -2
> [Rank 119][cm.c: line 860]poll CQ failed -2
> [Rank 85][cm.c: line 860]poll CQ failed -2
> [Rank 0][cm.c: line 860]poll CQ failed -2
> [Rank 9][cm.c: line 860]poll CQ failed -2
> [Rank 26][cm.c: line 860]poll CQ failed -2[Rank 43][cm.c: line 860]
> poll CQ failed -2
> [Rank 94][cm.c: line 860]poll CQ failed -2
> [Rank 111][cm.c: line 860]poll CQ failed -2

This error means that a CQE was polled which belongs to a none
existent QP. But, I do remember a case with an Opteron which
experienced the same problem and eventually it appeared that it was a
system problem that was resolved after a BIOS update. Can you check if
there is an update to your system's BIOS?

> 
> I can easily reproduce this with only 2 systems using a 16 process LU job, 
> class B.
> 
> Here are the configs I've tried-
> Suse 11 with distro provided IB driver and libraries,etc, using mvapich as 
> provided by ohio state
> Suse 11 with distro driver, using OFED 1.3.1 libraries and mvapich
> Suse 10.3 with OFED 1.3.1, OFED 1.2.5.4, and OFED 1.4rc3
> 
> They all have the same basic problem.  I think one of them reported "Error 
> polling CQ" instead of "poll CQ failed".
> 
> If I replace the connectX cards with regular DDR cards the problem goes away.
> 
> I'm getting quite stumped at this point and would appreciate any suggestions 
> or patches.
> 
> Thanks,
> Rick
> -- 
> Richard Warner
> Lead Systems Integrator
> Microway, Inc
> (508)732-5517
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list