[ewg] [perkinjo at cse.ohio-state.edu: ibv_open_xrc_domain error]

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Mar 8 06:41:18 PST 2013


I've sent the forwarded messages to linux-rdma initially but perhaps its
better to ask this question here.

----- Forwarded message from Jonathan Perkins <perkinjo at cse.ohio-state.edu> -----

Date: Wed, 6 Mar 2013 14:29:50 -0500
From: Jonathan Perkins <perkinjo at cse.ohio-state.edu>
To: linux-rdma at vger.kernel.org
Subject: ibv_open_xrc_domain error
User-Agent: Mutt/1.5.21 (2010-09-15)

Dear list:
Recently we have experienced failures using the ibv_open_xrc_domain
which gives an invalid parameter error code.  This failure started to
appear randomly after upgrading the kernel to 2.6.32-279.19.1.el6.x86_64
and seems to require us to reboot the node.  Whenever this happens we
notice that /var/log messages contain many of the following messages...

Feb 26 20:56:07 magny4 kernel: mlx4_core 0000:02:00.0: mlx4_eq_int:
MLX4_EVENT_TYPE_SRQ_LIMIT

Does anyone have any idea of what may be going wrong or how to debug
this issue?

Also, we've noticed that there is no user-space XRC support in OFED-3.5.
Will this support be added back in a future release?

Below is some information about our setup.

OFED-1.5.4.1
RHEL6 (2.6.32-279.19.1.el6.x86_64)

We're using many different platforms but here are two of them which show
the error.

Platform A:
CPU: AMD Magny Cour
HCA: Mellanox ConnectX VPI MT26428

Platform B:
CPU: Intel Kentsfield
HCA: Mellanox ConnectX VPI MT25418

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


----- End forwarded message -----

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo




More information about the ewg mailing list