[ofa-general] troubleshooting IB_CM_REJ_INVALID_SERVICE_ID in RDMA_CM_EVENT_REJECTED at active side of the connection
Or Gerlitz
or.gerlitz at gmail.com
Fri Feb 6 13:11:24 PST 2009
On Thu, Feb 5, 2009 at 6:47 AM, Isaac Huang <He.Huang at sun.com> wrote:
> I got some RDMA_CM_EVENT_REJECTED errors at active sides (i.e. nodes
> Poking around in CM code told me that the passive side couldn't find a listener with
> requested service_id on the incoming device of the connection request.
for this rdma-cm event, the status field would be a value from the
ib_cm_rej_reason,
so I assume you were getting IB_CM_REJ_INVALID_SERVICE_ID
> Could you guys give me some tips for troubleshooting? Any
> debugging options or /proc file to look at? Is there any netstat-like
> tool (e.g. something like a "netstat -ltp" to find out who is
> listening on which device)?
yes, this pain in the ass, currently there's no netstat line support
for RDMA connections
> The other possible cause could be ARP flux, but unfortunately arping
> via IPoIB always segfault on our systems. Is there any other way to
> troubleshoot possible ARP flux issues?
yes, ping could serve you in that respect, just use it and then look
on the resulted neighbours by doing $ip neigh show and comparing with
$ip addr show on the system you are pinging. Your problem may be
solved through correct setting of the arp_ignore sysctl attribute,
take a look on the known issues section in the ipoib release notes
provided with the ofed-docs package.
Or.
More information about the general
mailing list