[ofa-general] proper way to recover from poll CQ failed error

Dotan Barak dotanb at dev.mellanox.co.il
Thu Mar 13 01:05:28 PDT 2008


Hi.

The fact that ibv_poll_cq failed indicates that something bad happened.
Usually this failure should create any problem and only the process that 
had the problem is being
effected from this.

I personally think that the ib_* performance tools are better to check 
the performance of your subnet.

I will be happy if you'll answer the following questions:
Is this error is consistent?
Can you please send me the output of the ibv_devinfo of your machines?
Did you have any error message in the /var/log/messages when you saw 
this error?

thanks
Dotan

Murray Smigel wrote:
> Hi,
> I am running OFED-3.0 using ConnectX adapters in a two machine direct 
> connect mode.
> Most of the various pingpong tests seem ok, but when I run
> ibv_srq_pingpong -s 500 -n 1000
>
> I get poll "CQ failed -2" when I start up the client side.  Smaller 
> values of -s worked fine.
> Once this happens, no other pingpong tests seem to work.
> I have then unloaded all the ib_* mlmx_* and iw_* modules, reloaded 
> them and things still
> fail. I have to reboot the machines to get things back.
>
> 1) Is there a cleaner way to recover from this situation?
> 2) Is the initial failure an indication that something else is wrong?
> 3) Is the -s 1 latency I see with ibv_rc_pingpong of ~7 microseconds 
> reasonable?
>
> Thanks,
> murray smigel
>
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>




More information about the general mailing list