[openib-general] ibv_rc_pingpong debugging..

Heiko J Schick info at schihei.de
Fri Apr 28 23:49:40 PDT 2006


Hello Troy,

On 29.04.2006, at 01:15, Troy Benjegerdes wrote:

> So, how do I start debugging this?
>
> ibv_devinfo reports the port as active.. what else would cause this?
> (I am running the userspace modules from http://openib.red-bean.com/ 
> rc2/SOURCES/ , and kernel 2.6.16.11)
>
> [root at node3 netpipe3-dev]# ibv_rc_pingpong -n 1 node4
>  local address:  LID 0x0050, QPN 0x040404, PSN 0xd70996
>  remote address: LID 0x0061, QPN 0x050404, PSN 0xe5357a
> Failed status 12 for wr_id 2

If I counted correctly, WQE status 12 means IBV_WC_RETRY_EXC_ERR. As  
you can
see in src/userspace/libibverbs/include/infiniband/verbs.h:

enum ibv_wc_status {
	IBV_WC_SUCCESS,
	IBV_WC_LOC_LEN_ERR,
	IBV_WC_LOC_QP_OP_ERR,
	IBV_WC_LOC_EEC_OP_ERR,
	IBV_WC_LOC_PROT_ERR,
	IBV_WC_WR_FLUSH_ERR,
	IBV_WC_MW_BIND_ERR,
	IBV_WC_BAD_RESP_ERR,
	IBV_WC_LOC_ACCESS_ERR,
	IBV_WC_REM_INV_REQ_ERR,
	IBV_WC_REM_ACCESS_ERR,
	IBV_WC_REM_OP_ERR,
	IBV_WC_RETRY_EXC_ERR,
	IBV_WC_RNR_RETRY_EXC_ERR,
	IBV_WC_LOC_RDD_VIOL_ERR,
	IBV_WC_REM_INV_RD_REQ_ERR,
	IBV_WC_REM_ABORT_ERR,
	IBV_WC_INV_EECN_ERR,
	IBV_WC_INV_EEC_STATE_ERR,
	IBV_WC_FATAL_ERR,
	IBV_WC_RESP_TIMEOUT_ERR,
	IBV_WC_GENERAL_ERR
};

Perhaps you have problems with node4. I'm not sure for 100%, but I  
think this
error can be caused when node4 has not set up his IB resources (QPs,  
etc.)
properly. Do you see any errors on node4, too?

Regards,
	Heiko



More information about the general mailing list