[ofa-general] troubleshooting IB_CM_REJ_INVALID_SERVICE_ID in RDMA_CM_EVENT_REJECTED at active side of the connection

Or Gerlitz or.gerlitz at gmail.com
Fri Feb 6 13:11:24 PST 2009


On Thu, Feb 5, 2009 at 6:47 AM, Isaac Huang <He.Huang at sun.com> wrote:
> I got some RDMA_CM_EVENT_REJECTED errors at active sides (i.e. nodes
> Poking around in CM code told me that the passive side couldn't find a listener with
> requested service_id on the incoming device of the connection request.

for this rdma-cm event, the status field would be a value from the
ib_cm_rej_reason,
so I assume you were getting IB_CM_REJ_INVALID_SERVICE_ID

> Could you guys give me some tips for troubleshooting? Any
> debugging options or /proc file to look at? Is there any netstat-like
> tool (e.g. something like a "netstat -ltp" to find out who is
> listening on which device)?

yes, this pain in the ass, currently there's no netstat line support
for RDMA connections

> The other possible cause could be ARP flux, but unfortunately arping
> via IPoIB always segfault on our systems. Is there any other way to
> troubleshoot possible ARP flux issues?

yes, ping could serve you in that respect, just use it and then look
on the resulted neighbours by doing $ip neigh show and comparing with
$ip addr show on the system you are pinging. Your problem may be
solved through correct setting of the arp_ignore sysctl attribute,
take a look on the known issues section in the ipoib release notes
provided with the ofed-docs package.

Or.



More information about the general mailing list