[openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

Eric Barton eeb at bartonsoftware.com
Tue Aug 1 23:04:47 PDT 2006


I've had a report of rdma_connect() failing with a callback event type of
RDMA_CM_EVENT_UNREACHABLE and status -ETIMEDOUT although the peer node was
up and running at the time.

It seems this can be reproduced as follows...

1. Establish a connection between nodes A and B

2. Reboot node A

3. Start establishing a new connection from node A to node B

4. After a timeout, the CM callback occurs as described.

Could this happen with a buggy SM?  Are there some good places in the
OpenFabrics stack to add printks to help point the finger (or can some
existing debug/trace code be enabled)?

-- 
Cheers,

             Eric






More information about the general mailing list